A Day Refactoring Code, Part 2: Unit Testing

Continuing from part 1, we’ll begin writing test code from low-level functionality to higher-level, essentially paralleling the class as it grows in functionality as well.

It might seem like overkill to test very low-level code (such as getX()), but without low-level tests, a programmer may find himself analyzing the code at an improper level, mistaking for complex errors problems that are more fundamental.

Not testing low-level functionality led to a problem when I was prototyping a Javadoc parser (for DoctorJ). I wrote the original prototype (redundancy alert) in Ruby, then began migrating it to JRuby (intending to link it with Java code).

When I switched from Ruby to JRuby, there were failed unit tests, and I assumed that the problem was in my code, not in JRuby. (Yes, I adhere to the principle that “select ain’t broken”.)

Eventually, after much work, I discovered that a bug in the Regexp.last_match functionality in JRuby meant that the very high-level parsing functionality unit tests failed, and it took a lot of effort to drill down through the parsing code to discover that the issue was with the Regexp engine itself. I was thankful for the opportunity to contribute a fix for the issue, which was released with JRuby 1.6.5.

If I had had simpler tests, I probably would have found the issue sooner.

Back to unit testing in general, a good rule is to write tests in the order of the number of instance variables they test, either directly or indirectly. Thus the simplest tests would invoke methods that use only one instance variable (such as getX and setX). Subsequent tests should accrete in complexity, testing more instance variables.

This is essentially taking the concept from integration tests (multiple components) within unit testing itself, seeing variables (and sequences of methods and usage of multiple classes) as components as well. Thus unit testing begins with the smallest *unit* (variables), and expands to the point of being in the gray area where integration tests begin.

Back to this example, the next method to test is the one returning the character at the current position. Again we’ll write a custom assertion, then the test itself.

Implementing the tests such that it advances the position beyond the end of the string results in an exception:

        // out of range:
        str0.advancePosition(12);
        assertCurrentChar(null, str0);
java.lang.StringIndexOutOfBoundsException: String index out of range: 15
	at java.lang.String.charAt(String.java:694)
	at org.incava.text.PosString.currentChar(PosString.java:75)
	at org.incava.text.TestPosString.assertCurrentChar(TestPosString.java:36)
	at org.incava.text.TestPosString.testCurrentChar(TestPosString.java:84)

That shows that currentChar() should be updated to test that it is within bounds:

from:

        return this.str.charAt(this.pos);

to:

        return this.pos < this.len ? this.str.charAt(this.pos) : null;

The next test is for the hasMore method. What is important here is that we are testing the edge condition, for when pos is 1 less than len, is equal to len, and is 1 greater than it.

        str0.advancePosition();
        assertHasMore(true, str0);

        str0.advancePosition(12);
        assertHasMore(true, str0);

        str0.advancePosition();
        assertHasMore(false, str0);

        str0.advancePosition();
        assertHasMore(false, str0);

This seems to work … except that we didn't actually test the edge condition by hand. Why not? Laziness (false), where we just assumed that the string is 13 characters long.

Simplicity

This highlights another tenet of good unit testing: simplicity. The string we tested is so long ("this is a test" is my default string for string/regexp testing) that we didn't do the test "manually".

It's been a while that we've been rewriting the code, so we can consider checking this into our VCS, or we can just keep going.

When I'm on a multi-person project, I (nearly) never check in code that will fail the unit tests. In contrast, in my local environment, I find it easier to restart work after a break (especially when starting work in the morning) to have the project in the state of a failed unit test. This immediately gets me to the latest point of progress, and focuses my attention on the specific problem I was last working on. Buffer lists just aren't enough, since they only show me the latest files that I was working on, but not exactly what.

So we check in the code, then get back to the hasMore method. We made a mistake by, again, starting with overly complicated tests, in this case a string that was too long, so we'll go simple here, adding the field, and amending the method:

    private PosString abc;

    public void testHasMore() {
        assertHasMore(true, abc); // at 'a'
        // no change
        assertHasMore(true, abc);

        abc.advancePosition();  // at 'b'
        assertHasMore(true, abc);

        str0.advancePosition(); // at 'c'
        assertHasMore(true, abc);

        str0.advancePosition(); // off end
        assertHasMore(false, str0);
    }

Did you see the error? That was the first version that I wrote, which led to this failure:

junit.framework.AssertionFailedError: 'this is a test' (2) expected:<false> but was:<true>
	at org.incava.text.TestPosString.assertHasMore(TestPosString.java:44)
	at org.incava.text.TestPosString.testHasMore(TestPosString.java:123)    

‘this is a test’? This error was caused by my not doing a replace-all within the method, instead editing it “by hand”. (I’m convinced that copying and pasting, combined with search and replace, are responsible for 77.1% of software problems.)

We’ll fix our method:

    public void testHasMore() {
        assertHasMore(true, abc); // at 'a'
        // no change
        assertHasMore(true, abc);

        abc.advancePosition();  // at 'b'
        assertHasMore(true, abc);

        abc.advancePosition(); // at 'c'
        assertHasMore(true, abc);

        abc.advancePosition(); // off end
        assertHasMore(false, abc);
    }

Okay, that works, but should it? Is there “more” when the position is at the last character? I say no, so we change the hasMore code to call the method with num == 1:

    public boolean hasNumMore(int num) {
        return this.pos + num < this.len;
    }

We update the test:

        abc.advancePosition(); // at 'c'
        assertHasMore(false, abc);

Now the unit tests run without failure. Note that parameters to methods, and variables in general, have a length in characters proportionate to their scope. It's overkill to write "indexOfTheArray" in a small for-loop, whereas having instance variables named "td" are inadequately descriptive.

Writing the hasNumMore tests gives us a chance to use the Range class from the IJDK library, making for a bit terser and cleaner syntax:

    public void testHasNumMore() {
        for (int i : new Range(0, 10)) {
            assertHasNumMore(true, tenletters, i);
        }
    }

Here we're using another test string, "tenletters", which happens to be ten letters long. Fun, huh?

This is an excellent example of why custom assertions are valuable. Within a loop (checking a list of values), without a message for the assertEquals(), we would not know the value for which the assertion failed. But in this case, with the custom assertion, we get the output showing the value (10) that failed:

junit.framework.AssertionFailedError: pstr: ''tenletters' (0)'; num: 10 expected:<true> but was:<false>
	at org.incava.text.TestPosString.assertHasNumMore(TestPosString.java:51)
	at org.incava.text.TestPosString.testHasNumMore(TestPosString.java:131)

That actually shows that the test is incorrect: there are only 9 more characters from the initial position in the string. So we change the end of the range from 10 to 9 (that is, the range is 0 through 9), rerun, and the tests pass.

From the old class, we will pull the consume and consumeTo, amending their names and their code, starting with advanceTo, which is derived from the old consumeTo:

    public void advanceTo(String what) {
        while (this.pos < this.len && this.pos + what.length() < this.len && !this.str.substring(this.pos).startsWith(what)) {
            ++this.pos;
        }
    }

    public void testAdvanceTo() {
        assertCurrentChar('a', abc);
        abc.advanceTo("b");
        assertCurrentChar('b', abc);
        abc.advanceTo("c");
        assertCurrentChar('c', abc);
        abc.advanceTo("d");
        assertCurrentChar(null, abc);
    }

That raises another question: what happens when we advance to the end of the string? Should the current character be the last one, or should it be null?

We'll amend it to be null, and clean up the code, which has too long of a conditional (and redundant, with the "pos < len && pos + n < len" statements,
adding a new method, isMatch:

        while (this.pos < this.len && !isMatch(what)) {
            this.pos++;
        }

    public boolean isMatch(String what) {
        return this.pos + what.length() <= this.len && this.str.substring(this.pos).startsWith(what);
    }

Bringing over the old consumeFrom method gives us a chance to use the new isMatch method, for clearer code:

    protected boolean consume(String what) {
        skipBlanks();
        if (this.pos + what.length() < this.len && this.str.substring(this.pos).startsWith(what)) {
            this.pos += what.length();
            return true;
        }
        else {
            return false;
        }
    }

    public boolean advanceFrom(String what) {
        if (isMatch(what)) {
            advancePosition(what.length());
            return true;
        }
        else {
            return false;
        }
    }

This is in keeping with the principle of having fewer dots per line. more dots indicates a violation of the Law Of Demeter, with excessive low-level access to instance variables.

Notice that this method, as we've advanced through the class, is using no instance variables directly. Essentially the class is "coming up" from a low-level implementation closer to the point of how it will be used.

We'll bring over consumeFromTo, renaming it advanceFromTo. Again, we have no instance variables directly accessed:

    public void advanceFromTo(String from, String to) {
        if (advanceFrom(from)) {
            advanceTo(to);
        }
    }

We got all the way to here, and from looking at the code, we have an edge condition issue, as to what hasMore means. Does it mean "has more characters remaining to the end of the string", or does it mean "has a current character", that is, is not at the end?

Following the principle of when in doubt, copy, we'll imitate the behavior of the Java collections, specifically Iterable, for which hasNext() means "has a current element", and next() takes it.

So hasMore() will be amended to mean "has a current character", and will be renamed to hasChar(), which is less ambiguous:

    public boolean hasChar() {
        return this.pos < this.len;
    }

Notice that we slipped isMatch into the code, without adding tests for it, so let's fix that now, moving the tests before those for the advanceFrom/To methods, since isMatch is lower level, that is, is used by advanceFrom and advanceTo.

The implementation of assertIsMatch quotes the string. Against this can be useful, in cases where the string would be a whitespace character.

        String msg = "pstr: " + pstr + "; str: '" + str + "'";
        assertEquals(msg, exp, pstr.isMatch(str));

Repetition in Tests

We write the advance* methods, and end up with this one for testAdvanceFromTo():

    public void testAdvanceFromTo() {
        assertCurrentChar('a', abc);
        abc.advanceFromTo("a", "c");
        assertCurrentChar('c', abc);

        assertCurrentChar('t', tenletters);

        // no change
        tenletters.advanceFromTo("e", "n");
        assertCurrentChar('t', tenletters);

        // to first e
        tenletters.advanceFromTo("t", "e");
        assertCurrentChar('e', tenletters);
        assertPosition(1, tenletters);

        // to second e
        tenletters.advanceFromTo("e", "e");
        assertCurrentChar('e', tenletters);
        assertPosition(4, tenletters);

        // to second t
        tenletters.advanceFromTo("e", "t");
        assertCurrentChar('t', tenletters);
        assertPosition(5, tenletters);

        // to third t (next character)
        tenletters.advanceFromTo("t", "t");
        assertCurrentChar('t', tenletters);
        assertPosition(6, tenletters);
    }

By using the higher-level (custom) assertions the code is much cleaner, but there is obvious repetition, so we will extract the common code into a method that executes a subset of behavior and assertions.

    protected void doAdvanceFromToTest(Character expChar, int expPos, PosString pstr, String from, String to) {
        pstr.advanceFromTo(from, to);
        assertCurrentChar(expChar, pstr);
        assertPosition(expPos, pstr);
    }

Our testAdvanceFromTo method is now much cleaner:

    public void testAdvanceFromTo() {
        assertCurrentChar('a', abc);
        doAdvanceFromToTest('c', 2, abc, "a", "c");

        assertCurrentChar('t', tenletters);

        // no change
        doAdvanceFromToTest('t', 0, tenletters, "e", "n");

        // to first e
        doAdvanceFromToTest('e', 1, tenletters, "t", "e");

        // to second e
        doAdvanceFromToTest('e', 4, tenletters, "e", "e");

        // to second t
        doAdvanceFromToTest('t', 5, tenletters, "e", "t");

        // to third t (next character)
        doAdvanceFromToTest('t', 6, tenletters, "t", "t");

        // to end of string
        doAdvanceFromToTest('s', 9, tenletters, "t", "s");

        // off end of string
        doAdvanceFromToTest(null, 10, tenletters, "s", "j");
    }

A simple rule of economics is as cost decreases, demand (usage/consumption) increases, and that rule applied to testing is that as tests are easier (and more fun, frankly) to write, people will write more tests.

Note in the example above, with the low-level functionality and testing put into a separate method, the test code can focus on the edge conditions, such as “what happens if I’m on the same string as the one I want to advance to?” and “what happens when one advances to the end of the string?”

Edge conditions are where code will fail. Beware of thinking that more assertions are better than fewer; quality of assertions is better than sheer mass. (And “Sheer Mass” would be a good name for a rock band.)

In the next post, we’ll discuss naming, “the essence of programming”.

Advertisements

One thought on “A Day Refactoring Code, Part 2: Unit Testing

  1. Pingback: A Day Refactoring Code, Part 3: Naming | Jeff Pace's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s