Refactoring Exception Handling in Java

It seems that database and storage code are especially prone to be mere delegators, such as:

class DatabaseConnection {
    void write(String table, String key, Object value) {
        database.write(table, key, value);
    }
}

That wouldn’t be so painful if not for the number of exceptions in this type of code, and that the exceptions are often converted from one type (such as “DatabaseConnection” or “NetworkException”) to a different one (such as the more generic “StorageException”).

That leads to code much like in the following scenario, where we have a more generic Storage class that delegates to a database (via a DatabaseConnection). The idea of the Storage class is that it doesn’t need to be to a database, and could instead be to a filesystem or to memory. As described above, the Storage class throws the more generic StorageException, as well the PermissionException, which the DatabaseConnection also throws.

The database connection has the following (partial, obviously) definition:

class DatabaseConnection {
    public void connect(User user) throws DatabaseException, NetworkException, PermissionException {
    }

    public void write(String table, String key, Object value) throws DatabaseException, NetworkException, PermissionException {
    }

    public Object read(String table, String key) throws DatabaseException, NetworkException, PermissionException {
        return null;
    }

    public void disconnect() throws DatabaseException, NetworkException, PermissionException {
    }
}    

And our exceptions and the minimal User class are:

class DatabaseException extends Exception {
}

class NetworkException extends Exception {
}

class PermissionException extends Exception {
}

class User {
}

class StorageException extends Exception {
    public StorageException(Exception cause) {
        super(cause);
    }
}

The first implementation of the Storage class is the version so common that it seems that there must be a name for this type of write-a-delegate-by-copying-and-pasting (WADBCAP?).

class Storage {
    private final DatabaseConnection dbc;

    public Storage(User user) throws StorageException, PermissionException {
        try {
            dbc = new DatabaseConnection();
            dbc.connect(user);
        }
        catch (DatabaseException de) {
            throw new StorageException(de);
        }
        catch (NetworkException ne) {
            throw new StorageException(ne);
        }
    }

    public void write(String table, String key, Object value) throws StorageException, PermissionException {
        try {
            dbc.write(table, key, value);
        }
        catch (DatabaseException de) {
            throw new StorageException(de);
        }
        catch (NetworkException ne) {
            throw new StorageException(ne);
        }
    }

    public Object read(String table, String key) throws StorageException, PermissionException {
        try {
            return dbc.read(table, key);
        }
        catch (DatabaseException de) {
            throw new StorageException(de);
        }
        catch (NetworkException ne) {
            throw new StorageException(ne);
        }
    }

    public void disconnect() throws StorageException, PermissionException {
        try {
            dbc.disconnect();
        }
        catch (DatabaseException de) {
            throw new StorageException(de);
        }
        catch (NetworkException ne) {
            throw new StorageException(ne);
        }
    }
}    

The redundancy there is overwhelming, so let’s try to reduce it. From a Ruby perspective, it is obvious that using closures would fix this, but we’re in a version of Java (1.6) that doesn’t have closures, and a quick reading of this suggests that combining closures and checked exceptions will not be for the weak or fainthearted.

A closure is really just an anonymous (unnamed) method, which is impossible in Java, but there is a close approximation: an anonymous inner class. For the above code, we want some code that will delegate to our quasi-closure, but handle the exceptions for us. Ergo something like, where we’ll implement the execute method:

abstract class DatabaseStorageAction {
    public abstract Object execute() throws DatabaseException, NetworkException, PermissionException;

    public Object run() throws StorageException, PermissionException {
        try {
            return execute();
        }
        catch (DatabaseException de) {
            throw new StorageException(de);
        }
        catch (NetworkException ne) {
            throw new StorageException(ne);
        }
    }
}

An improved version of the above class would handle generic types, so that types other than Object could be expected from the execute method.

But it’s good enough for now, so let’s write a new version of the Storage class:

class StoragePartTwo {
    private DatabaseConnection dbc;

    public StoragePartTwo(final User user) throws StorageException, PermissionException {
        DatabaseStorageAction dsa = new DatabaseStorageAction() {
                public Object execute() throws DatabaseException, NetworkException, PermissionException {
                    dbc = new DatabaseConnection();
                    dbc.connect(user);
                    return null;
                }
            };
        dsa.run();
    }

    public void write(final String table, final String key, final Object value) throws StorageException, PermissionException {
        DatabaseStorageAction dsa = new DatabaseStorageAction() {
                public Object execute() throws DatabaseException, NetworkException, PermissionException {
                    dbc.write(table, key, value);
                    return null;
                }
            };
        dsa.run();
    }

    public Object read(final String table, final String key) throws StorageException, PermissionException {
        DatabaseStorageAction dsa = new DatabaseStorageAction() {
                public Object execute() throws DatabaseException, NetworkException, PermissionException {
                    return dbc.read(table, key);
                }
            };
        return dsa.run();
    }

    public void disconnect() throws StorageException, PermissionException {
        DatabaseStorageAction dsa = new DatabaseStorageAction() {
                public Object execute() throws DatabaseException, NetworkException, PermissionException {
                    dbc.disconnect();
                    return null;
                }
            };
        dsa.run();
    }
}    

It is arguable as to which is better in the above example, but as the number of exception types grows, and the pre- and post-closure code is more complex, the latter version is more justified.

Advertisements

sort{ |a, b| a <=> b }.ing with Ruby

One idiom from Perl that I’ve missed with Ruby is the ability to chain comparisons together, such as:

my @a = qw{ this is a test };

$, = ", ";

print sort { substr($a, -1) cmp substr($b, -1) || length($a) <=> length($b) } @a;
print "\n";

Which results in the output:

a, is, this, test

In Ruby, it’s a little more complicated, since Perl evaluates a zero as false, but Ruby does not. However, the nonzero? method for all Ruby Numeric objects essentially performs this conversion, for use in a boolean evaluation, returning nil if it is zero, and the number otherwise. So in Ruby, the above code would be:

a = %w{ this is a test }

puts a.sort { |a, b| (a[-1] <=> b[-1]).nonzero? || a.length <=> b.length }.join(", ")

One additional note: if you’re using this in a spaceship method (“”) for the Comparable interface, remember that it must return a numeric value, so if you chain evaluations together, the final statement should be zero, since all previous evaluations were nil (meaning that they were equal). This bit me during some recent DiffJ work, and here is an example of a corrected method:

class Java::net.sourceforge.pmd.ast::Token
  include Comparable, Loggable

  # ...

  def <=> other
    (kind <=> other.kind).nonzero? ||
      (image <=> other.image).nonzero? ||
      0
  end

That’s DiffJ opening the PMD token Java class and adding the Ruby Comparable interface to it, so tokens can be sorted in Ruby collections.

On that note, DiffJ is in rough beta status now. I’m using it for my work (refactoring and cleaning up legacy Java code), and just corrected a glitch in the Token code, ironically enough, for supporting usage in Hash objects. I’d neglected to implement the eql? method, erroneously thinking that Hash uses the Comparable code.

With that fix, the JRuby implementation of DiffJ produces the same output as the Java implementation. It’s somewhat slower, so I’ve been investigating AOT compiling of it, but that doesn’t seem to have much of an effect.

I just realized that another feature from Perl that I’ve missed (and until writing that code above, hadn’t used for 10 years) is defining the array separator with the “$,” variable. Similar to that, my RIEL library modifies the to_s method of an Array to output “, ” between elements for output, since the default is to have no space between elements.

How to Write JUnit Test Sets in Rake

As DiffJ has been rewritten in JRuby, it now has numerous tests, with a group of tests in which each test corresponds to a add/delete/change type, such as methods being added to a class declaration.

I found myself wanting to run subsets of tests, such as all those for methods. The organization of the tests is that pathnames match the scope of the code being checked, so, for example, src/test/ruby/diffj/type/method/parameters/typechange/test.rb is the test for checking for a change in the type of parameters of a method, which is part of a type, meaning an interface or class. (As a parenthetical aside, the test files themselves match the test pathnames, so the two files being tested here are src/test/resources/diffj/type/method/parameters/typechange/d0/Changed.java and src/test/resources/diffj/type/method/parameters/typechange/d1/Changed.java. And yes, this begs for an Emacs shortcut for toggling among the files, which I haven’t written yet.)

Although I’m a Rake novice, I had an idea of what I wanted: a way to specify in the Rakefile a subset of tests. Yes, I probably could have written test suites, but for whatever reason, it seemed more logical to do this as part of a build. And being a Z shell user, I also knew the pattern to apply. More about that in a minute.

So here’s what I wanted: to run “rake test:method”, and have that run all the unit tests under src/test/ruby/diffj/type/method. Or in glob-speak, “src/test/ruby/diffj/type/method/**/test*.rb”. (My testcases without their own tests are named “tc.rb”, so src/test/ruby/diffj/type/method/parameter/tc.rb has code common to the testcases under src/test/ruby/diffj/type/method/parameter.)

Enjoying the wonderful experience of writing object-oriented build scripts (bitterly said as a professional maintainer of Ant code), I subclassed the Rake test task,

class DiffJRakeTestTask < Rake::TestTask
  def initialize name, filter = name
    super(('test:' + name) => :testscompile) do |t|
      t.libs << $srcmainrubydir
      t.libs << $srctestrubydir
      t.pattern = "#{$srctestrubydir}/**/#{filter}/**/test*.rb"
      t.warning = true
      t.verbose = true
    end
  end
end

DiffJRakeTestTask.new('field')
DiffJRakeTestTask.new('method')
# ... and others

Going roughly line by line, that takes a parameter such as “method”, creates a test task named “test:method”, and adds a glob for test files (“test*.rb”) under a directory matching the filter, which defaults to the name. The “lib” lines add the directories containing my Ruby source and test code.

Using the filter means we can also specify partial or full paths for a test subset (even just a single file), writing them as needed, such as when working on failing tests. My needed ones ended up being the following:

DiffJRakeTestTask.new('method/body/zeroone')
DiffJRakeTestTask.new('method/parameters/zeroone')
DiffJRakeTestTask.new('method/throws/zeroone')
DiffJRakeTestTask.new('method/parameters/reorder')
DiffJRakeTestTask.new('method/parameters/reorder/typechange')

And we also can use the “filter isn’t the same as the name” functionality with the “test:all” task:

DiffJRakeTestTask.new('all', '*')

Yes, that’s probably redundant with the default “tests” task, but I prefer the feeling of wielding more control.

It looks like I’ll have DiffJ fully implemented in JRuby by the end of the month. I’m only packaging it as Zip files, and would appreciate offers to repackage it for Ubuntu, Fedora and other distributions. I know nearly nothing about Eclipse, but have had received interest in DiffJ as an Eclipse plugin, so that is another valuable contribution someone could make.

A Day Refactoring Code, Part 4: Performance

This post is part of the series about refactoring code, and this post follows part 3.

With the functionality set, let’s focus on performance, since this class is heavily used, being at the core of DoctorJ. The main function in that regard is checkCurrentWord(), which we can immediately see uses a StringBuffer instead of the more performant StringBuilder.

But before we start doing any performance tuning, let’s get a rough estimate of how the current performance is. Running just TestParsingSpellChecker takes 14.735 seconds, and all 300+ tests take 27.098 seconds.

Here’s the outline of the method, in terms of how a word is extracted for checking:

    protected void checkCurrentWord() {
        StringBuffer word = new StringBuffer();
        word.append(currentChar());
        
        int startingPosition = this.pstr.getPosition();
        boolean canCheck = true;

        this.pstr.advancePosition();

        while (hasMore()) {
            char ch = currentChar();
            ... check by character ...
        }

        if (canCheck && this.pstr.getPosition() - startingPosition > 1) {
            checkWord(word.toString(), startingPosition);
        }
    }

Replacing StringBuffer with StringBuilder reduces the time to 11.457 seconds for the single test, although the time for all tests doesn’t change.

The most obvious problem with this code is that we are violating DRY with how the substring is extracted. The starting position is redundant with the word being built, so we can just remove the word buffer, and store the substring by its starting position, extracting it from pstr at the end of processing. (Note that java.lang.String is optimized for substring, with how it uses a char[] and offset).

We’ll need a new method in PString to get a substring at a position other than the current one, and that can be quickly coded and tested:

    /**
     * Returns a substring from the given position, of <code>num</code>
     * characters.
     */
    public String substring(int position, int num) {
        return this.str.substring(position, this.position + num);
    }

See the problem there? We have clashing variables, this.position and position. I’ve changed back and forth from _var to this.var, and am undecided.

Back to TestPSC, the new method gets significantly cleaned up. We have two integers, startingPosition and nChars, for when and if the substring needs to be extracted. the test now runs in 8.821 seconds.

Okay, those numbers are misleading, since that is just the Gradle execution time, which includes startup, compilation, etc., and from looking at the unit tests, there are no performance tests — the unit tests are mostly simple tests consisting of one or just a few lines of input data for the spell checker.

Let’s write a unit test, and get some data for it. We’ll use our own dictionary as input, getting a subset of its lines, and scrambling a few to be misspelled words:

    ~doctorj% perl -lane 'next if rand() < 0.95; $_ = reverse($_) if rand() < 0.2; print' etc/words.en_US > src/test/resources/wordlist.txt

Note that that goes into src/test/resources, which gets added to the build subdirectory by Gradle, and thus is in the classpath, to be read from our test code (with a couple of IJDK methods):

    public void testPerformance() {
        InputStream in = getClass().getClassLoader().getResourceAsStream("wordlist.txt");
        List<String> lines = InputStreamExt.readLines(in);

        // wrap as a Javadoc comment block:
        lines.add(0, "/** \n");
        lines.add("*/\n");

        String comment = StringExt.join(lines, "\n");
    }

First, let’s look at what the performance was with the old version of PSC, from 5 revisions back:

~doctorj% git checkout master~5 src/main/java/org/incava/text/spell/ParsingSpellChecker.java

With our test:

    public void testPerformance() {
        InputStream in = getClass().getClassLoader().getResourceAsStream("wordlist.txt");
        List<String> lines = InputStreamExt.readLines(in);

        // wrap as a Javadoc comment block:
        lines.add(0, "/** \n");
        lines.add("*/\n");

        String comment = StringExt.join(lines, "\n");

        long start = System.currentTimeMillis();
        
        tcsc.check(comment);

        long end = System.currentTimeMillis();

        long duration = end - start;
        System.err.println("# words: " + lines.size() + "; duration: " + duration);
    }

Alas, there’s nearly no difference:

previous code: # words: 7377; duration: 71273
new code: # words: 7377; duration: 69576

But it’s a slight improvement, and the code is much simpler. We also have a new class, PString, for usage in other scenarios in which a string is quasi-parsed.

Finally, we’ll disable our performance tests from the unit tests, check everything in, and look for the next thing to fix.

A Day Refactoring Code, Part 3: Naming

“Naming is the essence of programming.” -Fred Brooks, The Mythical Man-Month

Continuing from part 2 of this series, let’s now revisit our names, now that we have a better understanding of the behavior here.

PosString hasn’t grown on me as a name, and taking a bit of inspiration from the GString in Groovy, let’s rename it PString, which suggests “Processable String” and “Positionable String”. We’ll also improve the field names “pos” and “len” by renaming them as “position” and “length”. “str” is also quite short, but is a common name for a string, whereas “pos” and “len” are less common.

Going back to ParsingSpellChecker (which I’ll refer to as PSC), PString now has all of its low-level string processing functionality. There are methods in ParsingSpellChecker that seem more specific to its domain, such as skipLink().

The edge condition here would be a method such as skipThroughWord(), which advances to the next character that is not a letter or digit. As the descriptions of the methods get longer, they are more likely to be too specific for usage in a general class.

So that’s it for PString. Let’s start migrating ParsingSpellChecker to use it. First is that we’ll keep the instance fields as public, and just switch the
references:

            this.pstr = new PString(str);
            this.str = pstr.str;
            this.len = pstr.length;

Through the code we replace “this.pos” with “this.pstr.position”, so again the low-level behavior is the same, just with ParsingSpellChecker being highly coupled to PString, for now.

We’ll do likewise with the len/length fields, so now our PSC methods have code such as:

    protected void consumeTo(String what) {
        while (this.pstr.position < this.pstr.length && this.pstr.position + what.length() < this.pstr.length && !this.str.substring(this.pstr.position).startsWith(what)) {
            ++this.pstr.position;
        }
    }

We’re midstream on this task, but given our status (show by running the alias for “git diff –stat HEAD”) we see that we have 52 changed lines of code, which is a reasonable amount of code for a checkin:

~doctorj% gitdfs
 .../org/incava/text/spell/ParsingSpellChecker.java |   52 +++++++++-----------
 1 files changed, 23 insertions(+), 29 deletions(-)

We check it in and get back to work:

~doctorj% git commit -m "Began transition for ParsingSpellChecker to use PString." .
[master 95db8c1] Began transition for ParsingSpellChecker to use PString.
 1 files changed, 23 insertions(+), 29 deletions(-)

We remove this.str references, replacing them with “this.pstr.str”, then began “flipping” the methods in PSC to call their PString equivalents, such as this change (diff) for currentChar():

     protected Character currentChar() {
-        return this.str.charAt(this.pstr.position);
+        return this.pstr.currentChar();
     }

We make the transition a bit easier by renaming “consume()” to “consumeFrom()”, matching our PString method advanceFrom().

We go through the PSC code, replacing usage of PString fields with accessors and the equivalent mutators, such as in skipBlanks():

    // before

    protected void skipBlanks() {
        while (this.pstr.position + 2 < this.pstr.length && currentChar() != '<' && !this.str.substring(this.pstr.position, this.pstr.position + 2).equals("{@") && !Character.isLetterOrDigit(currentChar())) {
            ++this.pstr.position;
        }
    }

    // after

    protected void skipBlanks() {
        while (this.pstr.hasChar() && !this.pstr.isMatch("<") && !this.pstr.isMatch("{@") && !Character.isLetterOrDigit(currentChar())) {
            this.pstr.advancePosition();
        }
    }

That’s still the worst offender in terms of “width”, the number of characters wide a line is, which is a good way to find candidate code to be rewritten “skinnier”. (Other ways to find candidate code: by number of lines, and by depth of nested blocks, a variant of width).

In the next part of this series we’ll wrap it up with a look at performance and code quality.

A Day Refactoring Code, Part 2: Unit Testing

Continuing from part 1, we’ll begin writing test code from low-level functionality to higher-level, essentially paralleling the class as it grows in functionality as well.

It might seem like overkill to test very low-level code (such as getX()), but without low-level tests, a programmer may find himself analyzing the code at an improper level, mistaking for complex errors problems that are more fundamental.

Not testing low-level functionality led to a problem when I was prototyping a Javadoc parser (for DoctorJ). I wrote the original prototype (redundancy alert) in Ruby, then began migrating it to JRuby (intending to link it with Java code).

When I switched from Ruby to JRuby, there were failed unit tests, and I assumed that the problem was in my code, not in JRuby. (Yes, I adhere to the principle that “select ain’t broken”.)

Eventually, after much work, I discovered that a bug in the Regexp.last_match functionality in JRuby meant that the very high-level parsing functionality unit tests failed, and it took a lot of effort to drill down through the parsing code to discover that the issue was with the Regexp engine itself. I was thankful for the opportunity to contribute a fix for the issue, which was released with JRuby 1.6.5.

If I had had simpler tests, I probably would have found the issue sooner.

Back to unit testing in general, a good rule is to write tests in the order of the number of instance variables they test, either directly or indirectly. Thus the simplest tests would invoke methods that use only one instance variable (such as getX and setX). Subsequent tests should accrete in complexity, testing more instance variables.

This is essentially taking the concept from integration tests (multiple components) within unit testing itself, seeing variables (and sequences of methods and usage of multiple classes) as components as well. Thus unit testing begins with the smallest *unit* (variables), and expands to the point of being in the gray area where integration tests begin.

Back to this example, the next method to test is the one returning the character at the current position. Again we’ll write a custom assertion, then the test itself.

Implementing the tests such that it advances the position beyond the end of the string results in an exception:

        // out of range:
        str0.advancePosition(12);
        assertCurrentChar(null, str0);
java.lang.StringIndexOutOfBoundsException: String index out of range: 15
	at java.lang.String.charAt(String.java:694)
	at org.incava.text.PosString.currentChar(PosString.java:75)
	at org.incava.text.TestPosString.assertCurrentChar(TestPosString.java:36)
	at org.incava.text.TestPosString.testCurrentChar(TestPosString.java:84)

That shows that currentChar() should be updated to test that it is within bounds:

from:

        return this.str.charAt(this.pos);

to:

        return this.pos < this.len ? this.str.charAt(this.pos) : null;

The next test is for the hasMore method. What is important here is that we are testing the edge condition, for when pos is 1 less than len, is equal to len, and is 1 greater than it.

        str0.advancePosition();
        assertHasMore(true, str0);

        str0.advancePosition(12);
        assertHasMore(true, str0);

        str0.advancePosition();
        assertHasMore(false, str0);

        str0.advancePosition();
        assertHasMore(false, str0);

This seems to work … except that we didn't actually test the edge condition by hand. Why not? Laziness (false), where we just assumed that the string is 13 characters long.

Simplicity

This highlights another tenet of good unit testing: simplicity. The string we tested is so long ("this is a test" is my default string for string/regexp testing) that we didn't do the test "manually".

It's been a while that we've been rewriting the code, so we can consider checking this into our VCS, or we can just keep going.

When I'm on a multi-person project, I (nearly) never check in code that will fail the unit tests. In contrast, in my local environment, I find it easier to restart work after a break (especially when starting work in the morning) to have the project in the state of a failed unit test. This immediately gets me to the latest point of progress, and focuses my attention on the specific problem I was last working on. Buffer lists just aren't enough, since they only show me the latest files that I was working on, but not exactly what.

So we check in the code, then get back to the hasMore method. We made a mistake by, again, starting with overly complicated tests, in this case a string that was too long, so we'll go simple here, adding the field, and amending the method:

    private PosString abc;

    public void testHasMore() {
        assertHasMore(true, abc); // at 'a'
        // no change
        assertHasMore(true, abc);

        abc.advancePosition();  // at 'b'
        assertHasMore(true, abc);

        str0.advancePosition(); // at 'c'
        assertHasMore(true, abc);

        str0.advancePosition(); // off end
        assertHasMore(false, str0);
    }

Did you see the error? That was the first version that I wrote, which led to this failure:

junit.framework.AssertionFailedError: 'this is a test' (2) expected:<false> but was:<true>
	at org.incava.text.TestPosString.assertHasMore(TestPosString.java:44)
	at org.incava.text.TestPosString.testHasMore(TestPosString.java:123)    

‘this is a test’? This error was caused by my not doing a replace-all within the method, instead editing it “by hand”. (I’m convinced that copying and pasting, combined with search and replace, are responsible for 77.1% of software problems.)

We’ll fix our method:

    public void testHasMore() {
        assertHasMore(true, abc); // at 'a'
        // no change
        assertHasMore(true, abc);

        abc.advancePosition();  // at 'b'
        assertHasMore(true, abc);

        abc.advancePosition(); // at 'c'
        assertHasMore(true, abc);

        abc.advancePosition(); // off end
        assertHasMore(false, abc);
    }

Okay, that works, but should it? Is there “more” when the position is at the last character? I say no, so we change the hasMore code to call the method with num == 1:

    public boolean hasNumMore(int num) {
        return this.pos + num < this.len;
    }

We update the test:

        abc.advancePosition(); // at 'c'
        assertHasMore(false, abc);

Now the unit tests run without failure. Note that parameters to methods, and variables in general, have a length in characters proportionate to their scope. It's overkill to write "indexOfTheArray" in a small for-loop, whereas having instance variables named "td" are inadequately descriptive.

Writing the hasNumMore tests gives us a chance to use the Range class from the IJDK library, making for a bit terser and cleaner syntax:

    public void testHasNumMore() {
        for (int i : new Range(0, 10)) {
            assertHasNumMore(true, tenletters, i);
        }
    }

Here we're using another test string, "tenletters", which happens to be ten letters long. Fun, huh?

This is an excellent example of why custom assertions are valuable. Within a loop (checking a list of values), without a message for the assertEquals(), we would not know the value for which the assertion failed. But in this case, with the custom assertion, we get the output showing the value (10) that failed:

junit.framework.AssertionFailedError: pstr: ''tenletters' (0)'; num: 10 expected:<true> but was:<false>
	at org.incava.text.TestPosString.assertHasNumMore(TestPosString.java:51)
	at org.incava.text.TestPosString.testHasNumMore(TestPosString.java:131)

That actually shows that the test is incorrect: there are only 9 more characters from the initial position in the string. So we change the end of the range from 10 to 9 (that is, the range is 0 through 9), rerun, and the tests pass.

From the old class, we will pull the consume and consumeTo, amending their names and their code, starting with advanceTo, which is derived from the old consumeTo:

    public void advanceTo(String what) {
        while (this.pos < this.len && this.pos + what.length() < this.len && !this.str.substring(this.pos).startsWith(what)) {
            ++this.pos;
        }
    }

    public void testAdvanceTo() {
        assertCurrentChar('a', abc);
        abc.advanceTo("b");
        assertCurrentChar('b', abc);
        abc.advanceTo("c");
        assertCurrentChar('c', abc);
        abc.advanceTo("d");
        assertCurrentChar(null, abc);
    }

That raises another question: what happens when we advance to the end of the string? Should the current character be the last one, or should it be null?

We'll amend it to be null, and clean up the code, which has too long of a conditional (and redundant, with the "pos < len && pos + n < len" statements,
adding a new method, isMatch:

        while (this.pos < this.len && !isMatch(what)) {
            this.pos++;
        }

    public boolean isMatch(String what) {
        return this.pos + what.length() <= this.len && this.str.substring(this.pos).startsWith(what);
    }

Bringing over the old consumeFrom method gives us a chance to use the new isMatch method, for clearer code:

    protected boolean consume(String what) {
        skipBlanks();
        if (this.pos + what.length() < this.len && this.str.substring(this.pos).startsWith(what)) {
            this.pos += what.length();
            return true;
        }
        else {
            return false;
        }
    }

    public boolean advanceFrom(String what) {
        if (isMatch(what)) {
            advancePosition(what.length());
            return true;
        }
        else {
            return false;
        }
    }

This is in keeping with the principle of having fewer dots per line. more dots indicates a violation of the Law Of Demeter, with excessive low-level access to instance variables.

Notice that this method, as we've advanced through the class, is using no instance variables directly. Essentially the class is "coming up" from a low-level implementation closer to the point of how it will be used.

We'll bring over consumeFromTo, renaming it advanceFromTo. Again, we have no instance variables directly accessed:

    public void advanceFromTo(String from, String to) {
        if (advanceFrom(from)) {
            advanceTo(to);
        }
    }

We got all the way to here, and from looking at the code, we have an edge condition issue, as to what hasMore means. Does it mean "has more characters remaining to the end of the string", or does it mean "has a current character", that is, is not at the end?

Following the principle of when in doubt, copy, we'll imitate the behavior of the Java collections, specifically Iterable, for which hasNext() means "has a current element", and next() takes it.

So hasMore() will be amended to mean "has a current character", and will be renamed to hasChar(), which is less ambiguous:

    public boolean hasChar() {
        return this.pos < this.len;
    }

Notice that we slipped isMatch into the code, without adding tests for it, so let's fix that now, moving the tests before those for the advanceFrom/To methods, since isMatch is lower level, that is, is used by advanceFrom and advanceTo.

The implementation of assertIsMatch quotes the string. Against this can be useful, in cases where the string would be a whitespace character.

        String msg = "pstr: " + pstr + "; str: '" + str + "'";
        assertEquals(msg, exp, pstr.isMatch(str));

Repetition in Tests

We write the advance* methods, and end up with this one for testAdvanceFromTo():

    public void testAdvanceFromTo() {
        assertCurrentChar('a', abc);
        abc.advanceFromTo("a", "c");
        assertCurrentChar('c', abc);

        assertCurrentChar('t', tenletters);

        // no change
        tenletters.advanceFromTo("e", "n");
        assertCurrentChar('t', tenletters);

        // to first e
        tenletters.advanceFromTo("t", "e");
        assertCurrentChar('e', tenletters);
        assertPosition(1, tenletters);

        // to second e
        tenletters.advanceFromTo("e", "e");
        assertCurrentChar('e', tenletters);
        assertPosition(4, tenletters);

        // to second t
        tenletters.advanceFromTo("e", "t");
        assertCurrentChar('t', tenletters);
        assertPosition(5, tenletters);

        // to third t (next character)
        tenletters.advanceFromTo("t", "t");
        assertCurrentChar('t', tenletters);
        assertPosition(6, tenletters);
    }

By using the higher-level (custom) assertions the code is much cleaner, but there is obvious repetition, so we will extract the common code into a method that executes a subset of behavior and assertions.

    protected void doAdvanceFromToTest(Character expChar, int expPos, PosString pstr, String from, String to) {
        pstr.advanceFromTo(from, to);
        assertCurrentChar(expChar, pstr);
        assertPosition(expPos, pstr);
    }

Our testAdvanceFromTo method is now much cleaner:

    public void testAdvanceFromTo() {
        assertCurrentChar('a', abc);
        doAdvanceFromToTest('c', 2, abc, "a", "c");

        assertCurrentChar('t', tenletters);

        // no change
        doAdvanceFromToTest('t', 0, tenletters, "e", "n");

        // to first e
        doAdvanceFromToTest('e', 1, tenletters, "t", "e");

        // to second e
        doAdvanceFromToTest('e', 4, tenletters, "e", "e");

        // to second t
        doAdvanceFromToTest('t', 5, tenletters, "e", "t");

        // to third t (next character)
        doAdvanceFromToTest('t', 6, tenletters, "t", "t");

        // to end of string
        doAdvanceFromToTest('s', 9, tenletters, "t", "s");

        // off end of string
        doAdvanceFromToTest(null, 10, tenletters, "s", "j");
    }

A simple rule of economics is as cost decreases, demand (usage/consumption) increases, and that rule applied to testing is that as tests are easier (and more fun, frankly) to write, people will write more tests.

Note in the example above, with the low-level functionality and testing put into a separate method, the test code can focus on the edge conditions, such as “what happens if I’m on the same string as the one I want to advance to?” and “what happens when one advances to the end of the string?”

Edge conditions are where code will fail. Beware of thinking that more assertions are better than fewer; quality of assertions is better than sheer mass. (And “Sheer Mass” would be a good name for a rock band.)

In the next post, we’ll discuss naming, “the essence of programming”.

A Day Refactoring Code, Part 1: Rewriting

This post continues the overview of code being refactored.

We’ll pull the str, pos, len fields and build a new class around them. Naturally, the first step is writing a unit test, beginning by testing the lowest-level functionality, which set, get, and advance. That low-level functionality is only an intermediate step in getting the class we want, since eventually we’ll have some more expansive behavior.

So for now those methods are public, but we’ll probably eliminate them or reduce their accessibility, thus encouraging usage of higher-level methods instead.

For each method we’ll be testing, we will have a custom assertion, an assertion that essentially wraps the low-level assertEquals with a more expressive message associated with it.

Custom assertions should generally follow the form:

    public void assertMethodName(VType expectedValue, OType objTested, Object ... methodArguments) {
        String msg = "obj: " + objTested + "; arguments: " + methodArguments;
        assertEquals(msg, exp, objTested.methodName(methodArguments));
    }

for example, here is the custom assertion for testing substring, which takes an
integer argument:

    public void assertSubstring(String exp, PString pstr, int pos, int num) {
        String msg = "pstr: " + pstr + "; pos: " + pos + "; num: " + num;
        assertEquals(msg, exp, pstr.substring(pos, num));
    }

This produces the output:

junit.framework.ComparisonFailure: pstr: 'abc' (0); pos: 1; num: 1 expected:<b> but was:<b>
	at junit.framework.Assert.assertEquals(Assert.java:85)
	at org.incava.text.TestPString.assertSubstring(TestPString.java:44)
	at org.incava.text.TestPString.testSubstringWithPosition(TestPString.java:110)

In comparison, unit tests that use the low-level assertEquals such as the follow result in the subsequent unit test output:

    public void assertSubstring(String exp, PString pstr, int pos, int num) {
        assertEquals(exp, pstr.substring(pos, num));
    }
junit.framework.ComparisonFailure: expected:<b> but was:<b>
	at junit.framework.Assert.assertEquals(Assert.java:85)
	at junit.framework.Assert.assertEquals(Assert.java:91)
	at org.incava.text.TestPString.assertSubstring(TestPString.java:48)
	at org.incava.text.TestPString.testSubstringWithPosition(TestPString.java:114)

In the first example, we know the position (1), number (1), and information about PString itself (its wrapped string is “abc”, and it’s at position 0).

In the next post, we’ll delve more deeply into the unit tests.

A Day Refactoring Code, Part 0: Overview

I’ve been rewriting much of DoctorJ, which has been suffering some code rot after several years of existence, now at major release 5.

While poking through the code, I found the class ParsingSpellChecker, which as its name implies, spell-checks, in particular working on Javadoc comment blocks, such as skipping <pre> blocks and "{@link" tags:

package org.incava.text.spell;

import java.io.InputStream;
import java.util.*;
import org.incava.ijdk.util.MultiMap;

public class ParsingSpellChecker {
    /**
     * Whether we can spell check, which we can't until we get a word or a
     * dictionary (which is a thing with a lot of words).
     */
    private boolean canCheck;

    /**
     * The spell checker, which doesn't parse.
     */
    private final SpellChecker checker;

    /**
     * The current string we're working on.
     */
    private String str;

    /**
     * The length of the current string.
     */
    private int len;

    /**
     * The current position within the string.
     */
    private int pos;
    
    public ParsingSpellChecker(SpellChecker checker) {
        this(checker, false);
    }

    public ParsingSpellChecker(SpellChecker checker, boolean canCheck) {
        this.checker = checker;
        this.canCheck = canCheck;
    }

    public boolean addDictionary(String dictionary) {
        this.canCheck = this.checker.addDictionary(dictionary) || this.canCheck;
        return this.canCheck;
    }

    public boolean addDictionary(InputStream dictionary) {
        this.canCheck = this.checker.addDictionary(dictionary) || this.canCheck;
        return this.canCheck;
    }

    public void addWord(String word) {
        this.checker.addWord(word);
        this.canCheck = true;
    }

    public void check(String str) {
        if (this.canCheck) {
            this.str = str;
            this.len = this.str.length();
            this.pos = 0;
    
            while (hasMore()) {
                skipToWord();
                if (hasMore()) {
                    if (Character.isLetter(currentChar())) {
                        checkCurrentWord();
                    }
                    else {
                        // not at an alpha character. Might be some screwy formatting or
                        // a nonstandard tag.
                        skipThroughWord();
                    }
                }
            }
        }
    }

    /**
     * Consumes from one string to another.
     */
    protected void consumeFromTo(String from, String to) {
        if (consume(from)) {
            consumeTo(to);
        }
    }

    /**
     * Skips content between pairs of brackets, a la HTML, such as, oh,
     * "<html>foobar</html>." Doesn't handle slash-gt, such as "<html
     * style="lackthereof" />".
     */
    protected void skipBracketedSection(String section) {
        consumeFromTo("<" + section + ">", "</" + section + ">");
    }

    protected void skipLink() {
        consumeFromTo("{@link", "}");
    }
    
    protected void skipBlanks() {
        while (this.pos + 2 < this.len && currentChar() != '<' && !this.str.substring(this.pos, this.pos + 2).equals("{@") && !Character.isLetterOrDigit(currentChar())) {
            ++this.pos;
        }
    }
    
    protected void skipToWord() {
        skipBracketedSection("code");
        skipBracketedSection("pre");
        skipLink();
        consume("&;nbsp;");
    }

    protected void checkWord(String word, int position) {
        MultiMap<Integer, String> nearMatches = new MultiMap<Integer, String>();
        boolean valid = this.checker.checkCorrectness(word, nearMatches);
        if (!valid) {
            wordMisspelled(word, position, nearMatches);
        }
    }

    protected void wordMisspelled(String word, int position, MultiMap<Integer, String> nearMatches) {
        int nPrinted = 0;
        final int printGoal = 15;
        for (int i = 0; nPrinted < printGoal && i < 4; ++i) { // 4 == max edit distance
            Collection<String> matches = nearMatches.get(i);
            if (matches != null) {
                for (String match : matches) {
                    System.out.println("    near match '" + match + "': " + i);
                    ++nPrinted;
                }
            }
        }
    }

    protected boolean consume(String what) {
        skipBlanks();
        if (this.pos + what.length() < this.len && this.str.substring(this.pos).startsWith(what)) {
            this.pos += what.length();
            return true;
        }
        else {
            return false;
        }
    }

    protected void consumeTo(String what) {
        int len = this.str.length();
        while (this.pos < len && this.pos + what.length() < len && !this.str.substring(this.pos).startsWith(what)) {
            ++this.pos;
        }
    }

    protected Character currentChar() {
        return this.str.charAt(this.pos);
    }

    protected boolean hasMore() {
        return this.pos < this.len;
    }

    protected void checkCurrentWord() {
        StringBuffer word = new StringBuffer();
        word.append(currentChar());
        
        int startingPosition = this.pos;
        boolean canCheck = true;

        ++this.pos;

        // spell check words that do not have:
        //     - mixed case (varName)
        //     - embedded punctuation ("wouldn't", "pkg.foo")
        //     - numbers (M16, BR549)
        while (hasMore()) {
            char ch = currentChar();
            if (Character.isWhitespace(ch)) {
                break;
            }
            else if (Character.isLowerCase(ch)) {
                word.append(ch);
                ++this.pos;
            }
            else if (Character.isUpperCase(ch)) {
                skipThroughWord();
                return;
            }
            else if (Character.isDigit(ch)) {
                skipThroughWord();
                return;
            }
            else {
                // must be punctuation, which we can check it if there's nothing
                // but punctuation up to the next space or end of string.
                if (this.pos + 1 == this.len) {
                    // that's OK to check
                    break;
                }
                else {
                    ++this.pos;
                    while (hasMore() && !Character.isWhitespace(currentChar()) && !Character.isLetterOrDigit(currentChar())) {
                        // skipping through punctuation
                        ++this.pos;
                    }
                    if (this.pos == this.len || Character.isWhitespace(currentChar())) {
                        // punctuation ended the word, so we can check this
                        break;
                    }
                    else {
                        // punctuation did NOT end the word, so we can NOT check this
                        skipThroughWord();
                        return;
                    }
                }
            }
        }

        // has to be more than one character:
        if (canCheck && this.pos - startingPosition > 1) {
            checkWord(word.toString(), startingPosition);
        }
    }

    protected void skipThroughWord() {
        ++this.pos;
        while (hasMore() && !Character.isWhitespace(currentChar())) {
            ++this.pos;
        }
    }
}

The problem with ParsingSpellChecker is that the low-level functionality (going through a string by “position” (current index))and checking against string length) don’t match the more high-level behavior of the class.

Another red flag in the class is there are three instance fields (len, pos, and str) that are very tightly connected, with essentially each of them having to be used with one another.

It is also telling that the indentation levels within the class are very deep, also indicating poor abstraction. The fewer indentation levels, the more likely it is that the code is evenly abstracted at the proper level.

Code repetition is another issue, with statements such as this.pos++ found in many locations.

From running PMD, we know that StringBuilder is preferred over StringBuffer for performance reasons. Similarly, by building its own substrings (via StringBuffer.append()), the code does not take advantage of String.substring, which uses offsets in the newly-created String to refer to a substring within the lower-level char array.

That’s the overview. In the next post, we’ll begin rewriting this mess.

Whence System.exit()?

A few days ago I was asked how to determine where in a Java program the call System.exit() is made. I came up with the following:

import java.util.Map;

public class TestSystemExit {
    public void methodOne() {
        methodTwo();
    }

    public void methodTwo() {
        methodThree();
    }

    public void methodThree() {
        methodFour();
    }
    
    public void methodFour() {
        System.exit(0);
    }

    public static void printStackTrace(Thread thread, StackTraceElement[] stElmts) {
        System.err.println("thread: " + thread);
        for (StackTraceElement stElmt : stElmts) {
            System.err.println("\tat " + stElmt);
        }
    }

    public static void main(String[] args) {
        Runtime.getRuntime().addShutdownHook(new Thread() {
            public void run() {
                Map stMap = Thread.getAllStackTraces();
                
                // only dump the stack trace with the method "exit":                
                for (Map.Entry stElmt : stMap.entrySet()) {
                    for (StackTraceElement st : stElmt.getValue()) {
                        if ((st.getClassName() + "." + st.getMethodName()).equals("java.lang.System.exit")) {
                            printStackTrace(stElmt.getKey(), stElmt.getValue());
                            break;
                        }
                    }
                }
            }
        });
        
        TestSystemExit tse = new TestSystemExit();
        tse.methodOne();
    }
}

The output is as follows:

thread: Thread[main,5,main]
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1203)
	at java.lang.Thread.join(Thread.java:1256)
	at java.lang.ApplicationShutdownHooks.run(ApplicationShutdownHooks.java:97)
	at java.lang.Shutdown.runHooks(Shutdown.java:106)
	at java.lang.Shutdown.sequence(Shutdown.java:150)
	at java.lang.Shutdown.exit(Shutdown.java:195)
	at java.lang.Runtime.exit(Runtime.java:107)
	at java.lang.System.exit(System.java:923)
	at TestSystemExit.methodFour(TestSystemExit.java:17)
	at TestSystemExit.methodThree(TestSystemExit.java:13)
	at TestSystemExit.methodTwo(TestSystemExit.java:9)
	at TestSystemExit.methodOne(TestSystemExit.java:5)
	at TestSystemExit.main(TestSystemExit.java:46)

DoctorJ 5.2.0 and DiffJ 1.2.0 Released

DoctorJ 5.2.0 is an extensive rewrite of the code, with unit tests expanded and refined. It uses the latest version of PMD as the parsing and AST code, and is the first version of DoctorJ to use the IJDK module, leading to much more elegant code. In terms of functionality this version add spell-checking for strings of a minimal length (which defaults to 2). The project now builds with Gradle.

DiffJ 1.2.0 also has significantly rewritten code, and also uses the IJDK module. This version adds (on terminals that support it) colorization of differences. It too uses Gradle for its build.

Both projects are now distributed only in zip format, and I welcome offers to repackage them for various package managers.

The distributes are available for download: