Recommended Books

This is a list of what I consider the best books relevant to programming, and perhaps, even to life.

Two of the older books, which shaped me during my C++ days, are Large Scale Software Design (by John Lakos) and Object-Oriented Design Heuristics (by Arthur Riel). The former is very relevant to Java projects, perhaps even more so now, since by its nature, Java makes it unclear as to the package hierarchy. That is, java.io and java.util are intertwined, with mutual dependencies. Having a clear package/module hierarchy can make it easier to understand the levels within a project, and how their behavior aggregates up through those levels.

Object-Oriented Design Heuristics is simply a must-read for anyone doing OO programming, which means essentially every programmer. Maybe even the functional programmers. It’s been a long time since I’ve read it, but one salient point that I remember is that managers are discouraged. That is, nonsense such as ResourcePoolManager, which often (usually? always?) act as god classes, and break the OO design by centralizing behavior instead of having it tightly integrated with the client classes. The book goes extensively into inheritance, and how is it abused and distorted, where classes within a hierarchy often break down in terms of being properly decoupled.

If you program, you must, absolutely must, understand regular expressions. I cannot fathom how anyone doing text-based programming (which also means essentially everyone) could not use regular expressions. Every day. Every hour. Much of the magic of dealing with huge code bases is simply mastering regular expressions. So the must-read book on this topic, by another Jeff, is Mastering Regular Expressions, by Jeffrey Friedl. It’s another book that if you read through chapter 3, you’ll understanding 80% of regular expressions, an application of the 80/20 rule, AKA the Pareto principle.

For Scrum, the best book is Agile Software Development with Scrum, by Ken Schwaber and Mike Beedle. Unlike those released by the book-by-the-pound publishers, this one is concise and direct, clocking in at around 150 pages. As with the book above, reading just three chapters will give you 80% of the full understanding of Scrum.

Hackers and Painters, by Paul Graham, is another must-read, probably the best book I’ve read about the mindset of great programmers, and creators/artists in general. It makes clear that programming is more art than science, and more painting than engineering, contrary to the roots, and biases, of this field. It’s so good that I recommend it not only to programmers, but to those fortunate people who have a programmer in their lives, and want to understand their mental processes.

Delving into non-programming books, I suggest The Fifth Discipline, by Peter Senge. It’s more about businesses than about software, but one point that programmers should appreciate is that there are systemic behaviors, or accidental behaviors, as Fred Brooks might say, resulting from the organization of a system, such as a business, but also programming teams, and even the design of their code. Trying to work around an inappropriately designed system can be extremely frustrating, even to the point of futility.

This might be an odd choice, but I’m going to shoehorn it in here anyway, because I like it so much: The Gentle Art of Verbal Self-Defense, by Suzette Elgin. It discusses patterns in communication, such as loaded questions, those with invalid premises (“If you really cared about getting this feature done, you wouldn’t be wasting time refining the build process”). Especially in this era where multi-cultural teams are the norm, it is imperative that people understand the deeper significance of the words they choose. It is somewhat like The Fifth Discipline, such that communication itself results in systemic behaviors and dynamics within a team.

I don’t think it’s too controversial to say that programmers are gifted, and by that I don’t mean that they are superior to others – just different. Very different. As Paul Graham elucidates in Hackers and Painters, there is simply a different internal mechanism among programmers, which results in their (illogical, to some) obsession with “minor details”, that is, the core of their work. (Programming is mostly just minor details, aggregated into huge systems.) Programmers also tend to be highly sensitive, as could be expected of people who have to closely watch for even slight variations in behavior or performance. So I suggest The Gifted Adult, by Mary-Elaine Jacobsen, which shows insights into how those people think and behave, and why their levels of concentration can be easy disrupted. I especially think that software managers should read it, to understand why the cats that they are trying to herd behave so much unlike “normal people”.

Wrapping up my recommendations is another really odd choice: The Path Between the Seas, by David McCullough. It’s about the building of the Panama Canal, about how after years of failed attempts, the canal succeeding in being built by two things: building railroads (to move workers and to remove dirt) and eliminating disease (yellow fever). John Stevens, the head engineer, devoted a significant amount of time (over a year, as I recall) building railroads instead of digging. So once the digging phase began, resources could be moved much more quickly, and with dirt removed, it reduced the exposure to mudslides from the frequent heavy rains. Dr. William Gorgas also solved a resource problem by eliminating disease, thus keeping workers productive. How this fits with software is that much of our work isn’t the digging per se; we have a one-off, with the system around the project itself, such as resources, both material and human.

Glark 1.10.0

Glark 1.10.0 is ready to be released, after a couple of years of being not a high priority for me. I was inspired to rewrite it when I looked through the code, much of it written early in my Ruby days. It began as a Perl script, and retained that scriptitude through its life.

At times while rewriting Glark, I wish that I’d blogged that experience. The short description is that I had a few primary guidelines:

  • Test as thorough as possible, ideally one test (at least) per feature. A feature is essentially the same as an option. Each source file/class should have an equivalent test case.
  • Keep files and classes small, and relatively even in size.
  • Simplify the set of options.
  • Eliminate global variables.
  • Add one feature per day.

Following these principles resulted in a code base I am much more satisfied with.

Previously much, if not most, of Glark was “field-tested”, a euphemism for “I tried it out a while ago, and I think it worked then.” As the test suite grew, the code became much easier to refactor with confidence.

Regarding the size of files and classes, I used a simple metric:

% wc lib/**/*.rb | sort -rn

And then I usually tackled what was at the bottom of the list.

The average file is now 59 lines long, with the largest being 201 lines, and the smallest, 10 lines. In the previous implementation, the smallest file was 102 lines, the largest, 761 lines, and the average, 328 lines.

Option processing was the major chunk of code tangled through the code base, primarily because there was a single Options class, a singleton used essentially everywhere throughout the code. I first split that into the subsets of options, such as those for the input options, for matching, and for output, with their equivalent submodules using only those option objects instead of the global/singleton.

This was further cleaned up by removing the option processing from what I eventually labeled their “specs”, the values that determined the behavior of the submodule. One idea that I had is that eventually those specs could be passed in from outside of Glark itself, for usage by external programs, such as PVN.

Adding one feature per day, which I’ve written about previously, motivated me to do some non-coding things, mainly writing documentation. I’ve realized that much of the distinguishing functionality of Glark hasn’t been well documented, and the man page has now increased from 927 lines to 1126.

It’s been an interesting evolution of the behavior of Glark, as much of its early functionality has been added into grep, such as colorized matches and context around them. Elaborating on what I wrote in the readme/man page:

Glark extends grep by matching complex expressions, such as “and”, “or”, and “xor”. This is useful in a case such as “I’m looking for “foo” and “bar”, within three lines from each other.” It can be infinitely complex, such as, also from the man page: “glark –and=5 Regexp –xor parse –and=3 boundary quote”, meaning: (within 5 lines of each other: (/Regexp/ and (/parse/ xor (within 3 lines of each other: /boundary/ and /quote/))))

Glark handles file, directory, and path arguments, optionally recursing directories to a certain depth, and processing path arguments as a set of files and directories. .svn and .git subdirectories are automatically excluded.

I realize, with a bit of guilt, that that defies the Unix principle of keeping programs small, and with minimal overlapping functionality, since much of that is already done by the “find” command. However, some of this behavior was included in early Glark, and grep itself has the “-r” option for recursing directories, so I wanted to extend that to be more advanced, in part because when running Glark on Windows systems, there is no “find” command.

Binary files are excluded (by default), but can, in the case of compressed or archived files, have their extracted contents be searched.

This rolls into Glark behavior that I’d wanted for a while, mainly for searching Jar files for class names, which I previously did via shell scripting, such as:

for i in *.jar; do jar tf $i | glark --label=$i Exception; done

That now can be done with:

glark --binary-files=list Exception *.jar

Glark can use a per-project configuration file, so different projects can have their own Glark parameters, such as different files to include and exclude for searching, and different colors for pattern highlighting. My goal there is that it can add feedback for when one is working in different projects, such as highlighting matches in Java code differently than in Ruby. Colorizing is still
only on a per-project basis, not on the file type itself, which I’m considering adding, since it might be helpful to distinguish matches in Ant build files from those in Gradle.

I am doing some final testing, and then the Ruby Gem should be available. I am looking for maintainers to repackage Glark as an RPM and a Debian package, although I will probably release unofficial packages for those within a few days as well.

Stati of Projects

I’ve been jumping around from one project to another on an as-needed basis, and here is the current status of each.

RIEL – I’ve been updating the colorizing code, adding the functionality to use extended color codes on ANSI terminals, instead of the default 10.

Glark – In the midst of a major rewrite, for both code purity and functionality. The main changes so far are extensions to the path/file arguments, bringing in some functionality from find. Glark will also (with the imminent integration of RIEL) support extended colors.

PVN – This project was in heavy development until a month ago, when it went into a state of waiting for updates to Glark, since it will be using Glark for its seek (searching) subcommand.

DiffJ – This project I rewrote in JRuby, but that was a little too slow for a command-line application –  even heavily tweaked, I couldn’t get the startup time under two seconds. So I re-rewrote it in Java, and intend to revisit it to add more intelligent code comparisons, such as understanding that “if (a == b) foo();” is the same as “if (a == b) { foo(); }”

IJDK – Mostly dormant at this point. When I’ve rewritten Java projects, I’ve tried to extract the generic code from them and add them to IJDK, but I haven’t been heavily involved with any of my Java projects lately.

Java-Diff – A while ago this was brought up to date to use generics, and was refactored for code clarity. There hasn’t been any reason to update it since then.

DoctorJ – Alas, this is dormant. Some of its warnings have been integrated into the Java compiler, such as mismatched parameter names. It still goes beyond that level of pedanticalness, so it is most suitable in the development of projects where documentation is paramount, such as APIs.

Related posts:

Tests Result in Better Coders

We know that tests result in better code. They also result in better programmers.

The first implementation of a project will often be utilitarian, complying with the edict of “just get it working”. Code will be written and tweaked until the tests pass, but there are few cycles, if any, devoted to refining the code to be higher quality. This the red/green/refactor cycle is limited to just red and green.

The refactoring phase is where the code goes from simply working to being well-crafted. As the mindset of the programmer changes from being focused on utility to being focused on quality, they program accordingly different.

With the principle of quality being foremost, and with the confidence from having a thorough set of tests, the programmer can also push the boundaries of their knowledge, such as delving into advanced object-oriented programming and metaprogramming, which can be dauntingly complex and risky. Those areas do not necessary have an immediate impact on functionality, so when a programmer is in functionality/utility mode, they’re less likely to employ those techniques. However, in quality mode, a programmer will feel more justified, and confident, in using those techniques, which over the long term can cause a dramatic improvement to a project.

Often forgotten is that both code and tests should be refactored. Test code that is unclear, misleading, or just wrong can be very frustrating to someone trying to understand a body of code, since the tests are the best starting point.

The bottom line is that programmers must understand that testing includes refactoring (including of the test code itself), and that the refactoring phase is where programmers, and projects, can become vastly better.

Related posts: