Quite a while ago I first read Object Calisthenics, by Jeff Bay, in the Thoughtworks Anthology, and it has resulted in pivotal change in how I write software. A testimony to its value to me is that I’ve reread it several times, putting it into what I would consider my desert-island set of books.
A PDF of it is available here.
Initially some of the suggestions struck me as odd, but overall I agreed with them. With the 1.3.0 rewrite of DiffJ, I decided to make more of an effort to put all of the principles into practice, and analyze and review the experience.
The difference was amazing.
The idea of Object Calisthenics is that the high-level concepts and principles of good object-oriented programming can be distilled into low-level rules to follow.
The rules are:
- One level of indentation per method
- Don’t use the “else” keyword
- Wrap all primitives and strings
- First class collections
- One dot per line
- Don’t abbreviate
- Keep all entities small
- No classes with more than two instance variables
- No getters/setters/properties
Some of those rules sound familiar (keeping entities small, avoiding getX/setX), but others seem bizarre: don’t use else? wrapping primitives? no more than two instance variables?
But they work, and make sense.
1. One level of indentation per method.
This rule is simple, and applied, greatly reduces complexity. I avoid long methods — never more than a vertical screenful — and “wide” methods should be equally avoided. A good metric is to consider the complexity of a method equal to the product of its length (lines of code) and its depth (indentation). (Maybe one of these days I’ll revive statj, for Java statistical summaries.)
This rule also reduces complexity because it pushes code out of the depths of the problematic method to the objects being acted on. Thus a refactoring process might go from the original code of:
public void moveEntities(List<Entity> entities) { for (Entity entity : entities) { if (isOnScreen(entity)) { for (Entity other : entities) { if (entity != other && entity.nextLocation().equals(other.location())) { collide(entity, other); } else { moveEntity(entity); } } } } }
to:
public void moveEntities(List<Entity> entities) { for (Entity entity : entities) { processEntityLocation(entity, entities); } } public void processEntityLocation(Entity entity, List<Entity> entities) { if (!isOnScreen(entity)) { return; } Entity other = getCollisionEntity(entity, entities); if (other == null) { moveEntity(entity); } else { collide(entity, other); } } public Entity getCollisionEntity(Entity entity, List<Entity> entities) { for (Entity other : entities) { if (entity != other && entity.location().equals(other.location())) { return other; } } return null; }
Note the two levels of indentation in the getCollisionEntity method. I haven’t found a way to do the for-if-return loop with only one level.
The primary motivation of this rule is that one is not dealing with multiple levels — both indentation and abstraction — at once.
The bottom line: it works. I’ve written much more straightforward code this way, and the result is small methods that can then be pushed out of the current class to the primary object being acted upon, which generally, is the first parameter of the method.
For example, we could move getCollisionEntity to the Entity class:
public class Entity { public Entity getCollisionEntity(List<Entity> entities) { for (Entity other : entities) { if (this != other && location().equals(other.location())) { return other; } } return null; } // ... }
2. Don’t use the “else” keyword
The argument is to reduce the number of branches, keeping long conditional blocks out of the code.
This seems less justified than the other rules, but one way I have found it beneficial is to use the ternary operator instead of if-else, which has two results: very simple statements, and concise methods. Of course, this only works if the return types are compatible.
Modifying the code above, the result is:
return other == null ? moveEntity(entity) : collide(entity, other);
I find that much easier to read, and it avoids the vertical code sprawl.
My spin on this rule is to avoid the “else if” sequence, that is, the long switches as seen below, which can oftentimes be refactored to be more polymorphic.
if (x) { .... } else if (y) { .... } else if (z) { .... }
3. Wrap all primitives and strings
This I found difficult to get used to, but now I’m a vehement advocate, for a reason not cited in the original paper.
The reason he states is for type-checking and conveying information to another programmers about what the variable actually is.
If you have type checking, use it. A (the?) strength of Java is the compiler (OK, the JVM too), which can find our silly errors, such as:
void launchMissile(boolean isNuclear, boolean isTest, Coordinates target); // run a non-nuclear missile test: launchMissile(false, target, false);
It’ll give us a nice helpful friendly message, such as:
Norad.java:13: launchMissile(boolean,boolean,java.lang.String) in mil.norad.Norad cannot be applied to (boolean,java.lang.String,boolean)
However, the compiler can’t help us beyond that. Such as:
// run a non-nuclear missile test: launchMissile(true, false, target);
It might seem like overkill (pun intended) to wrap the boolean (or use enums), but look at the result:
void launchMissile(MissileType missileType, TestStatus testStatus, Coordinates target); // run a non-nuclear missile test: launchMissile(MissileType.NON_NUCLEAR, TestStatus.IS_TEST, target);
And, going back to Ruby, because Ruby (unlike Java) allows subclassing of all built-in types, it’s easy to simply derive a new type, and eventually to expose the necessary methods of the superclasses as delegators in the new class, and then to un-derive the class from the primitive type.
The original paper highlights the benefit of wrapping primitive types as for conveying meaning (to the compiler and to the programmer), but I think there’s a more important reason, and benefit, for avoiding primitive types: the fact that it is nearly certain that the behavior of the primitive type will not match how a variable is used.
For example, in a given body of code, one might write this:
String userName;
That is saying that a user name is a string. Okay, let’s see if that’s right:
- Strings can have zero length. User names cannot.
- The length of a String can be 2,147,483,647. User names cannot.
- Strings can contain spaces. User names cannot.
- … etc. …
Another problem with this is that Strings have functionality (methods) that makes no sense for a user name. lastIndexOf? replaceAll? split? java.lang.String has 69 constructors and instance methods, and a user name would use only a few of those.
Thus the declaration “String userName” is quite incorrect (as in “lying”). One might say that “a user name is a string, but with certain restrictions, and with only a subset of the functionality of java.lang.String, and with additional behavior and associations.” That is, there (at least conceptually, if not in the code) is a class UserName, which is approximately, but hardly exactly, a string. And the more approximate that equivalence is, the greater the burden on the programmer and on other code to perform the mapping.
4. First class collections
From the original: "any class that contains a collection should contain no other member variables." Said differently, that means that a collection should be wrapped in a class specific to how the collection is used.
Thus the logic for using a collection goes into the wrapper class, instead of in the "client" code. This feels more like working with an object instead of in it.
For example, a set of scores (mapped from a name to a list of scores) could be implemented as a simple class, with the average and total methods added:
class ScoreList def initialize @scores = Array.new end def << score @scores << score end def [] index @scores[index] end def total @scores.inject(0) { |sum, n| sum + n } end def size @scores.size end def to_s @scores.to_s end end class Scores def initialize @scores = Hash.new { |h, k| h[k] = ScoreList.new } end def add name, score @scores[name] << score end def scores name = nil name ? @scores[name] : @scores.values end def average @scores.empty? ? nil : total / count end def total sum_of_scores { |list| list.total } end def count sum_of_scores { |list| list.size } end def to_s @scores.inspect end def sum_of_scores &blk scores.inject(0) { |sum, list| sum + blk.call(list) } end end
As method parameters in a type-checked language, it self-documents the method, and keeps the variable types shorter than using the equivalent Java generics:
public void process(Map<String, List<Integer>> scores); public void process(Scores scores);
And with the earlier example, the List could be changed to an Entities class, containing the logic for a set of entities.
For example, my original draft of the Scores class had a Scores class, which was a Hash of Strings to Arrays of integers. When I reviewed this post, I realized that that version was exposing the collection type Array, so I rewrote it with ScoreList wrapping the Array class, as well as Scores wrapping the Map class.
As with primitives, another reason not to use collections directly is that because they are general purpose, they contain methods that very likely don’t apply to how you are using the collection. For example, a Java method returning ArrayList implies that all of the methods of ArrayList are valid, including methods to delete elements from the list.
This rule revolutionized my perspective, and is one that I wish I’d learned and begun practicing years ago. All code I’ve written since applying this is much cleaner and feels much more object-oriented than the legacy practice. Now it’s odd to work directly with a core collection. Maybe Java was right in putting their collections a bit out of the way, in java.util instead of java.lang, to discourage them from being used directly.
5. One dot per line
This is similar to the rule about the “depth” of method blocks, in essence keeping a single level of abstraction.
Applying this to the Ruby code above (Ruby tends to be “horizontal”, with methods chained) eliminates the @scores.values.flatten statements:
class ScoreList def average @scores.empty? ? nil : total / all_scores.length end def total all_scores.inject(0) { |sum, n| sum + n } end def all_scores scores.flatten end end
It also tends to make one push the car.getEngine().setSpeed(55) chain into the appropriate classes, letting them do the necessary delegating.
Again, this fixes the problem of dealing with too many levels of abstraction at once, and of working through objects instead of on them.
6. Don’t abbreviate
I just saw a violation of this today, a method named “getPWD()”, to return a password. As a long-time Unix user, I read that as “print working directory”, a problem common with abbreviating: the user has have to understand the context, and switch their mind-set appropriately.
Another problem with abbreviations is that can become eclipsed. A piece of legacy code that I maintain is named “IM” (for interface module), obviously written prior to instant messaging, which claimed that abbreviation. (It is ironic that “eclipse” is a word that has — at least to many programmers — has its original definition eclipsed, thanks to the IDE.)
I also agree with the point that methods with long names tend to be misplaced. A common occurrence of that is when nouns are in the message name itself, such as:
sendMessageToUser(message, user);
which should instead be
message.sendTo(user);
In general, this rule reflects classic object-oriented design: classes should be nouns, and methods should be verbs.
The corollary for the no-nouns-in-methods subrule is that classes should not contain verbs (or nounified verbs), such as “UserManager” and “MessageSender”.
7. Keep all entities small
This rule seems extreme: no classes more than 50 lines, and no packages with more than 10 files.
I definitely like the concept, that large classes tend to become god classes, doing much more work than they should, instead of that behavior being pushed to the acted-upon classes. And I agree that long classes are difficult to read. Notice how IDEs collapse methods? That’s a sign of “fixing” a problem by hiding it.
My corollary to this is that classes should be some minimal length as well. If a class in a verbose language (Java, C++) is too short (10-20 lines or so), it is usually a C-style struct pretending to be a class (just with getX() instead of a public x). Thus it usually has little behavior defined within that class, putting the onus on client code to do the processing.
However, if this rule is applied rigorously, it could make the classes from the core libraries too simple. A string class certainly needs to have a full suite of behavior to be useful. java.lang.String has 69 methods, and the Ruby String class has 131 methods of its own, with another 45 inherited. A guideline is that code should be more concise and focused the closer it gets to the end user.
That said, I look at my projects with a simple metric: the number of lines per class. There should be a relatively narrow range among the line count, deviating little from the average. That is, a nice, narrow bell curve.
A bad sign — and seen all too often — are the projects with 75% of their files containing fewer than 50 lines of code, and 5% with more than 500 lines. Again, that indicates god classes.
Another problem with large classes is that they are simply too complex to test adequately, whereas small classes, with relatively self-contained behavior, are easy to test. And the simple law of economics applies: the lower the cost, the more the consumption. Thus the cheaper it is to write a test, the more that will be written.
8. No classes with more than two instance variables
This is probably the most bizarre rule of all of these, primarily because it goes against standard practice. However, if one applies the above rules, specifically about not using primitives and collections directly, it is much more feasible.
For example, a game could have a unit with the location and movement as:
public class Unit { private int x; private int y; private int direction; private int speed; }
Applying this rule, we could define the Location and Velocity classes, replacing the above with:
public class Unit { private Location location; private Velocity velocity; }
That’s another sign of good design, having variable names that closely match their types.
And as with the above, having the cohesively-defined Location and Velocity classes makes them easier to test, as well as adding behavior to them, such as range validation, such as that a location has to be within the coordinates of a rectangle (such as the screen).
9. No getters/setters/properties
An object should be “set” via its constructor, and low-level changes (meaning to its instance variables) thenceforth should be within the object itself.
Taking the example in rule 8, that would mean that Location could have a move(Velocity) method.
The principle here is that instead of directly manipulating an object (again, working in the object) the object should be “told” what to do, in the spirit of the original concepts of object-oriented programming, sending messages to an object.
The benefit of this rule is that it encourages thinking in terms of what the object does, instead of what the object has, that is, the “verbs” of the object instead of its “nouns”.
I highly encourage everyone to read the original, which is excellent in explaining the mapping from object-oriented principles to the programming practices.
If you liked this post — and who wouldn’t? — you might also like my series of posts about rewriting a class, using many of these rules.