Glark 1.10.0 is ready to be released, after a couple of years of being not a high priority for me. I was inspired to rewrite it when I looked through the code, much of it written early in my Ruby days. It began as a Perl script, and retained that scriptitude through its life.
At times while rewriting Glark, I wish that I’d blogged that experience. The short description is that I had a few primary guidelines:
- Test as thorough as possible, ideally one test (at least) per feature. A feature is essentially the same as an option. Each source file/class should have an equivalent test case.
- Keep files and classes small, and relatively even in size.
- Simplify the set of options.
- Eliminate global variables.
- Add one feature per day.
Following these principles resulted in a code base I am much more satisfied with.
Previously much, if not most, of Glark was “field-tested”, a euphemism for “I tried it out a while ago, and I think it worked then.” As the test suite grew, the code became much easier to refactor with confidence.
Regarding the size of files and classes, I used a simple metric:
% wc lib/**/*.rb | sort -rn
And then I usually tackled what was at the bottom of the list.
The average file is now 59 lines long, with the largest being 201 lines, and the smallest, 10 lines. In the previous implementation, the smallest file was 102 lines, the largest, 761 lines, and the average, 328 lines.
Option processing was the major chunk of code tangled through the code base, primarily because there was a single Options class, a singleton used essentially everywhere throughout the code. I first split that into the subsets of options, such as those for the input options, for matching, and for output, with their equivalent submodules using only those option objects instead of the global/singleton.
This was further cleaned up by removing the option processing from what I eventually labeled their “specs”, the values that determined the behavior of the submodule. One idea that I had is that eventually those specs could be passed in from outside of Glark itself, for usage by external programs, such as PVN.
Adding one feature per day, which I’ve written about previously, motivated me to do some non-coding things, mainly writing documentation. I’ve realized that much of the distinguishing functionality of Glark hasn’t been well documented, and the man page has now increased from 927 lines to 1126.
It’s been an interesting evolution of the behavior of Glark, as much of its early functionality has been added into grep, such as colorized matches and context around them. Elaborating on what I wrote in the readme/man page:
Glark extends grep by matching complex expressions, such as “and”, “or”, and “xor”. This is useful in a case such as “I’m looking for “foo” and “bar”, within three lines from each other.” It can be infinitely complex, such as, also from the man page: “glark –and=5 Regexp –xor parse –and=3 boundary quote”, meaning: (within 5 lines of each other: (/Regexp/ and (/parse/ xor (within 3 lines of each other: /boundary/ and /quote/))))
Glark handles file, directory, and path arguments, optionally recursing directories to a certain depth, and processing path arguments as a set of files and directories. .svn and .git subdirectories are automatically excluded.
I realize, with a bit of guilt, that that defies the Unix principle of keeping programs small, and with minimal overlapping functionality, since much of that is already done by the “find” command. However, some of this behavior was included in early Glark, and grep itself has the “-r” option for recursing directories, so I wanted to extend that to be more advanced, in part because when running Glark on Windows systems, there is no “find” command.
Binary files are excluded (by default), but can, in the case of compressed or archived files, have their extracted contents be searched.
This rolls into Glark behavior that I’d wanted for a while, mainly for searching Jar files for class names, which I previously did via shell scripting, such as:
for i in *.jar; do jar tf $i | glark --label=$i Exception; done
That now can be done with:
glark --binary-files=list Exception *.jar
Glark can use a per-project configuration file, so different projects can have their own Glark parameters, such as different files to include and exclude for searching, and different colors for pattern highlighting. My goal there is that it can add feedback for when one is working in different projects, such as highlighting matches in Java code differently than in Ruby. Colorizing is still
only on a per-project basis, not on the file type itself, which I’m considering adding, since it might be helpful to distinguish matches in Ant build files from those in Gradle.
I am doing some final testing, and then the Ruby Gem should be available. I am looking for maintainers to repackage Glark as an RPM and a Debian package, although I will probably release unofficial packages for those within a few days as well.