Formatting Code (Properly == Consistently)

The first rule of formatting is that code should be consistent. That means that the code should be consistent within itself — that is, within its files — and across the code base.

And consistent with the language itself, or more accurately, the framework and libraries of the language.

Programmers should not try to make a statement with the format of their code. I’ve seen hideous, oddly-formatted code where it seems that the programmers wanted simply to look different, since their alternative style made no improvement in the legibility of the code. Sure, it’s different. But the problem is: it’s different. There is cognitive friction in that I’m reading code that doesn’t look like the other code in that language. It doesn’t look (exactly) like a JDK class. Or a Ruby library. Or part of the C++ STL. It’s different.

One of my primary tenets of programming (and a lot of other things) is: change only one variable at a time.

Nonstandard formatting violates that principle because it doesn’t take the predominant formatting style of a language as a constant, and instead treats it is a variable. So as a reader of that code I must grapple with two variables: the code itself, which is new (thus a variable) to me, and its nonstandard format, also new/different to me, not matching my expectations of how code in the language is supposed to look.

Yes, I know I can just reformat the code. And we can read files with a chain of input stream objects instead of, oh, something like IO.readlines. In the words of Alfred North Whitehead: “Civilization advances by extending the number of important operations which we can perform without thinking about them.”

Said inversely, this means that our experience worsens with the number of variables we have to think about, with which I’m sure any programmer would agree. And that includes code formatting.

A Day Refactoring Code, Part 3: Naming

“Naming is the essence of programming.” -Fred Brooks, The Mythical Man-Month

Continuing from part 2 of this series, let’s now revisit our names, now that we have a better understanding of the behavior here.

PosString hasn’t grown on me as a name, and taking a bit of inspiration from the GString in Groovy, let’s rename it PString, which suggests “Processable String” and “Positionable String”. We’ll also improve the field names “pos” and “len” by renaming them as “position” and “length”. “str” is also quite short, but is a common name for a string, whereas “pos” and “len” are less common.

Going back to ParsingSpellChecker (which I’ll refer to as PSC), PString now has all of its low-level string processing functionality. There are methods in ParsingSpellChecker that seem more specific to its domain, such as skipLink().

The edge condition here would be a method such as skipThroughWord(), which advances to the next character that is not a letter or digit. As the descriptions of the methods get longer, they are more likely to be too specific for usage in a general class.

So that’s it for PString. Let’s start migrating ParsingSpellChecker to use it. First is that we’ll keep the instance fields as public, and just switch the
references:

            this.pstr = new PString(str);
            this.str = pstr.str;
            this.len = pstr.length;

Through the code we replace “this.pos” with “this.pstr.position”, so again the low-level behavior is the same, just with ParsingSpellChecker being highly coupled to PString, for now.

We’ll do likewise with the len/length fields, so now our PSC methods have code such as:

    protected void consumeTo(String what) {
        while (this.pstr.position < this.pstr.length && this.pstr.position + what.length() < this.pstr.length && !this.str.substring(this.pstr.position).startsWith(what)) {
            ++this.pstr.position;
        }
    }

We’re midstream on this task, but given our status (show by running the alias for “git diff –stat HEAD”) we see that we have 52 changed lines of code, which is a reasonable amount of code for a checkin:

~doctorj% gitdfs
 .../org/incava/text/spell/ParsingSpellChecker.java |   52 +++++++++-----------
 1 files changed, 23 insertions(+), 29 deletions(-)

We check it in and get back to work:

~doctorj% git commit -m "Began transition for ParsingSpellChecker to use PString." .
[master 95db8c1] Began transition for ParsingSpellChecker to use PString.
 1 files changed, 23 insertions(+), 29 deletions(-)

We remove this.str references, replacing them with “this.pstr.str”, then began “flipping” the methods in PSC to call their PString equivalents, such as this change (diff) for currentChar():

     protected Character currentChar() {
-        return this.str.charAt(this.pstr.position);
+        return this.pstr.currentChar();
     }

We make the transition a bit easier by renaming “consume()” to “consumeFrom()”, matching our PString method advanceFrom().

We go through the PSC code, replacing usage of PString fields with accessors and the equivalent mutators, such as in skipBlanks():

    // before

    protected void skipBlanks() {
        while (this.pstr.position + 2 < this.pstr.length && currentChar() != '<' && !this.str.substring(this.pstr.position, this.pstr.position + 2).equals("{@") && !Character.isLetterOrDigit(currentChar())) {
            ++this.pstr.position;
        }
    }

    // after

    protected void skipBlanks() {
        while (this.pstr.hasChar() && !this.pstr.isMatch("<") && !this.pstr.isMatch("{@") && !Character.isLetterOrDigit(currentChar())) {
            this.pstr.advancePosition();
        }
    }

That’s still the worst offender in terms of “width”, the number of characters wide a line is, which is a good way to find candidate code to be rewritten “skinnier”. (Other ways to find candidate code: by number of lines, and by depth of nested blocks, a variant of width).

In the next part of this series we’ll wrap it up with a look at performance and code quality.