Rewriting Glark

For the PVN project, I’ve wanted to use Glark as a library for matching text (for the ‘seek’ subcommand), but I’d written Glark as a command-line application, and its design reflected that. It also, like so many “field-tested” programs, had very few tests, with the expectation that because
of being heavily used, flaws would easily surface. That’s relatively accurate: I’m fairly sure that I use Glark more than any other program not built into Unix (I even have grep aliased to “glark -g”).

I was once asked in a job interview what my opinion was of my own code. My response was that my code evidently sucks, because I’m always rewriting it. That was definitely true with Glark, being one of my first Ruby programs, migrating it from Perl, and not having touched it in a long time.

The problems with Glark were classics of bad programming: lack of tests, overly complicated code, use of global variables, poor class composition, and excessive coupling.

So I’ve been eager to rewrite Glark, but without tests, a program is much too brittle, so I knew that I’d first have to add tests. I simple can’t enjoy writing code without tests. However, with a comprehensive test suite, rewriting code is bliss. So I first wrote a few tests, then refined my test framework to the point that writing a unit test is as simple as this:

def test_simple
  fname = '/proj/org/incava/glark/test/resources/textfile.txt'
  expected = [
              "    3   -rw-r--r--   1 jpace jpace   45450 2010-12-04 15:24 02-TheMillersTale.txt",
              "   10   -rw-r--r--   1 jpace jpace   64791 2010-12-04 15:24 09-TheClerksTale.txt",
              "   20   -rw-r--r--   1 jpace jpace   49747 2010-12-04 15:24 19-TheMonksTale.txt",
              "   24   -rw-r--r--   1 jpace jpace   21141 2010-12-04 15:24 23-TheManciplesTale.txt",
  run_app_test expected, [ '--xor', '\b6\d{4}\b', 'TheM.*Tale' ], fname

That’s Glark matching 6nnnn ^ TheM*Tale. At this point, grep has added some of what made Glark quite distinct from it — highlighted/colorized matches and context — but Glark’s most unusual (and fun to program) feature is matching of compound expressions, such as:

% glark --and=3 write --or puts print **/*.rb

That is matching “write” within 3 lines of puts or print.

Do you need that very often? Nope. But it does come in handy, such as in the case of “I’m looking for where we are catching an InvalidArgumentException and logging it (within the next 5 lines) as an error:

% glark --and=5 'catch.*InvalidArgumentException' 'Log.error' **/*.java

Speaking of interviews, a friend of mine has a good practice of when he goes to a company to assess their software, he asks to see what they consider their worst code. Often that code is at the core of their project and is the oldest code, written early on by someone who may have left the company, and/or the code has been piled on with more and more complexity that it is difficult to detangle.

In Glark, the worst code of the entire application was the Options class, clocking in at 761 lines long, containing the 42 options in Glark. This class is a Singleton, which is the fancy Design Pattern way of saying Global Variable. (The worst by-product of the Gang of Four was the sanctifying of Singletons as being a good practice.)

Another sign that the Options class was written poorly is an insanely simple metric: wc. That is, running the “wc” command on all files, sorting them numerically, and looking at the largest files. There are the bottom, in all its corpulent glory:

% wc **/*.rb | sort -rn
    4     7   160 lib/glark.rb
  102   654  5213 lib/glark/help.rb
  183   527  4569 lib/glark/input.rb
  248   640  6064 lib/glark/exprfactory.rb
  266   681  6052 lib/glark/output.rb
  297   777  7392 lib/glark/glark.rb
  440  1048  9663 lib/glark/expression.rb
  761  2615 23377 lib/glark/options.rb
 2301  6949 62490 total

The Options class is used throughout Glark, so extracting it was quite challenging. I decomposed the Options class into smaller groups, and it just so happened that there was a design to follow, documented in, of all places, the help (man) page for Glark. That is, because there are so many options, for legibility and organization they are displayed in the man page in the groups “input”, “matching”, “output”, and “debugging/errors”. So I repackaged the Glark options into input, match, output, and info, and also used that as the organization for the modules within Glark. Thus the Glark::File class went into lib/glark/input/file.rb, and Grep::Lines went into lib/glark/output/grep_lines.rb.

I continued to refine the tests to the point that adding new ones was trivial. As the test coverage increased, this had the effect of making it an aberration when I worked on untested code.

On that note, here’s an easy way to test your test. That is to determine if your test really works, break the code that it is testing. (A “return nil if true” at the beginning a method works nicely.) If the test still passes, then it’s incomplete. If the test fails, then should add confidence that test coverage is adequate. This is also where it becomes fun to break code, and to break tests. As with anything, if it’s fun, we’ll do more of it, which is why it is essential to have a test framework that makes tests easy (ergo, fun) to write.

For years I’ve tracked my daily progress by the simple metric of lines of code, but after hearing this suggested on the Ruby Rogues podcast, I’ve begun the practice of adding one user-facing feature per day, such as a new subcommand or option to PVN. My definition of “feature” includes adding and refining documentation, and it also includes removing options, especially if they are confusing, redundant, unused or obsolete.

I track features with a Features.txt in the root directory of each project, of the form (from PVN):

Thu Oct 25 19:22:52 2012

  seek command: added [ -C --no-color ] option.

This keeps me on track by actually recording features that are added. A script I run on all {project}/Features.txt files shows whether I’ve added one for today, and when I’ve missed a day (only one since I started doing this).

My process for adding features feels like an extension of the TDD process, in short:

  • conceive of a feature
  • add it as a test
  • implement it
  • run tests
  • refactor the tests and code …
  • document the feature in the readme and help files
  • add the feature to the features file
  • commit with the feature description as the comment description.

That’s about it for this update on the coarse rewriting of Glark. You can track the progress of it on GitHub (, and if you want to see the code before the rewrite, check out revision 4d10f192f46ec3df34f971f8b40e03f8df0aed27.