Introducing XuMoQi

Ki and za! Zax and zoa! Xyst and zyme! Yurta and zowie!

As I’ve written about before, I have been developing a game to help learn Scrabble words. I’m what might be considered a reluctant Android developer, being no major fan of Java, XML, and Eclipse, but I wanted an app for my phone so that I could learn and practice words for Scrabble, my board and electronic game of choice.

Other apps, such as Syrious Scramble, seem to use a dictionary other than those used for Scrabble, so I disliked the feeling of playing a word in those games and being told it was invalid, such as ain.

So I looked, in vain (and ain, eventually) for a Scrabble-compliant word app, and finding none, I decided to write one, XuMoQi, which is barely not none. That is, it is extraordinary in its plainness, overarching in being one-dimensional. (Okay, two dimensions, but only barely.)

Some random thoughts and experiences while writing the app follow.

What up, Dawg?

The word lists are huge, and I tried to optimize them for speed instead of space. I am considering rewriting them to being a modified DAWG (http://stevehanov.ca/blog/index.php?id=115), but haven’t yet figured out how to get my searching and pattern-matching to fit with those, which presume that all characters are known before doing the search.

Field-testing the app showed that it is reasonable in its speed, and I haven’t had any out of memory errors. Then again, I’ve tested in a field of one, on my relatively newish phone.

All work and no play …

Pattern-matching is done in separate threads, usually taking less time than the duration of the user inputting their words. I found this
(http://vaibhavtolia.wordpress.com/2013/10/03/79) to be a good resource on explaining the apparent overuse of, uh, resources.

Editing the EditView

Alas, the Android library doesn’t seem to have a decent (or any) EditView to fit my needs: no auto suggestion, no word completion, and submit (or send) when the enter (angled arrow) is pressed. So I wrote my own, which is on GitHub here: https://github.com/jpace/xumoqi/blob/master/src/org/incava/xumoqi/MultiLineSendEditText.java

Iconoclast Icons

I scaled the vertical learning curve of Gimp (it is really a curve if it’s a straight line?) and developed a few icons, rolling them by hand and applying various light effects. I started by following this http://www.gimpshop.com/tutorials/how-to-create-a-logo guide, which helped me learn the basics, then I found the Filters > Logo functionality, which scripts a set of filters in preset ways. Eventually I developed, with the Chrome effect, this masterpiece:

xu7

I charged myself nothing for my work as a graphics artist, and got my money’s worth.

I wasn’t sure how to change the Icons, and found an excellent guide here: http://bigknol.com/open-blog/2013/10/change-android-launcher-icon-using-eclipse-ide/.

Screenshots

As if I needed more things to dislike about Eclipse, the version I’m using, Juno Service Release 1, would not take screenshots, and with such a beautiful app as this, it was imperative to capture such artistry. But Eclipse, and perhaps this is a feature, not a bug, would not capture my screenshots, so instead of running the emulator through Eclipse, I ran it off the command line, as described here (http://www.addthis.com/blog/2013/07/22/10-tips-for-android-emulator/#.Uwp0v1ErFG4)

% cd /opt/adt-bundle/adt-bundle-linux-x86_64-20131030/sdk
% ./tools/android avd
% ./platform-tools/adb shell screencap -p | sed ‘s/\r$//’ > ~incava/android/v1.2/screen2.png

Table with equal columns

Maddening it was to get my table to display columns with an equal width. Naturally, of course, is that the width should be set to 0:

http://androidadvice.blogspot.com/2010/10/tablelayout-columns-equal-width.html.

Name

XuMoQi comes from three two-letter words commonly used in Scrabble, xu, mo, and (wait for it) qi. I pronounce it “zoo-MOE-key”, but when she saw it, my wife immediately said “zoo monkey”. So if this app ever gets a mascot, you know what it will be.

Exporting

It wasn’t intuitive to figure out how to export my app such that Google would accept it, but this explains it: http://help.testflightapp.com/customer/portal/articles/1279844-how-to-create-an-apk-adt-bundle.

Future Features

I believe that I have adequately described XuMoQi as primitive, primitive as in the sense of the opening of 2001: A Space Odyssey, but with fewer bones. Based on feedback, and my own experience, I plan to make it more game-like, i.e., keeping scores, rating words based on difficult, and adding a timer.

And despite that the graphics are incredibly perfect, I’ll probably find some way to make them even better.

Getting the App

In the unlikely chance that anyone wants to get this app, it is available on (in?) Google Play Store.

Hello, (a Different Part of the) World

Last week I traveled to Dubai and Qatar, a vacation with my spouse and an opportunity to see the Middle East.

Dubai was luxurious, a center of wealth and commerce, a spot where my spouse wanted to indulge in some well-deserved pampering. But I got out often, mostly on foot, and explored the environs (Deira City).

I really enjoyed the Middle Eastern cuisine, and foul medames is going to be rotated into my carb-free diet. Of course, it being a Muslim country, you’re out of luck if you’re craving (pork) bacon – I wasn’t – but it was interesting to see the alternatives: had I access to a kitchen, I would have loved to try the veal bacon we found in a supermarket. And beef bacon, which I did have, could also rotate into my diet, it being much less greasy (that’s a feature) than pork bacon.

Being an aficionado of tall buildings (my when-I-grow-up goal from the age of 12 to 16 or so was to be an architect), I had to go up the Burj Khalifa, at 2717 feet the tallest building on Earth. It was almost comical how tall it is, literally towering over the 1000+ foot buildings around it. A beautiful piece of architecture, and well worth the price.

We stayed at the Park Hyatt, a western-style hotel, but one matching the design of the older Middle East, as opposed to the glass-and-steel buildings in the area that could have been in the skyline of Chicago.

A couple of matters about Dubai: I’d read online that they’re very strict about bringing things into the country, specifically alcohol, tobacco, and drugs, including prescription medications. So I spent some time beforehand, getting copies of my prescriptions, then anxiously awaited the full-body and cavity searches when I arrived.

Hah. It’s simpler to get into a Smithsonian museum than to go through customs in Dubai. In fact, I wasn’t really sure that I was going through it until I was done. That’s it? Much worrying for nothing, and I guess for the first time, there was incorrect information on the Internet.

Note: Dubai is dry, i.e., alcohol isn’t sold, except in hotels, where it is insanely expensive, at five to ten times the cost in a store over here. But at the Dubai airport, you can go through the duty-free shop on the way out of the airport and bring up to five bottles out. So, for all my readers who are going to Dubai and want to drink wine while you’re there, that’s my helpful tip. You’re welcome.

Dubai is also very orderly. I noticed that people don’t even jaywalk, even at night with no cars around, and just patiently wait for the pedestrian signal to switch to green.

On to Qatar, to the west. I traveled there alone, while the missus left in the opposite direction, back to where she is currently working. I had an overnight layout in Doha, so I took off, on foot, and walked up to Souk Waqif, an outdoor (mostly) market to the north of the city. Getting up there was no easy feat, since my own feet were frequently ensconced in a nice thicket of mud, left there by the recent rain. Evidently when you almost never get rain, you don’t build drains for the roads, so much of the water went onto the “sidewalks”, i.e., the area, possibly but not necessarily paved, alongside the roads.

And Qatar is not orderly. I had a couple of close calls with cars: there evidently is no right of way for pedestrians. More like “you can get right out of the way”. That might be one reason that I encountered nearly no one else during my five or six mile stroll around the city.

Souk Waqif was fascinating. I love being a fish out of water, and as a nearly-stereotypical North American, I didn’t fit in at all. Perfect. Nearly everyone there was Arabic, but there were enough written and spoken English that I didn’t have any difficulty.

I saw and listened to an outdoor performance, and Middle Eastern music is becoming a new favorite, especially since I tend to like music that others abhor (metal, old country and western, and disco). On the plane I listened to El Liala’s Khalina Lewahdina, a very catchy mixture of Arabic music and disco (sorry, I mean “hip-hop”), and I must have listened to it ten times.

At Souk Waqif I went to an outdoor cafe and had an excellent dinner: lamb and chicken kabobs, moutabai, fattouch, and a couple of (non-alcoholic) drinks, whose names escape me. I also enjoyed, for the first time, a shisha, of grape and mint. I need one of those. (Off Jeff goes to Amazon … ). I guess that’s what is otherwise known as a hookah. Well, consider my horizons broadened.

After I ordered my dinner (and yes, this blog entry is about to have a reference to programming, so those of you that have stuck this far will have your patience rewarded), I got out my phone and decided to do a little Scrabble practice. See, I’ve been writing an Android app (my first) to help me (and anyone else) practice Scrabble words. I’m a big-time Scrabble fan (jpace317 on Origin), and
although I’m decent at bingos, I’m trying to up my game with the three- and four-letter words. (I’m looking in your direction, zax and chay.)

So I fired up my app, XuMoQi, and the first query was: “.uq” (as in, what is the missing letter). Answer, “suq”, a variant spelling of … wait for it … “souk”. As in Souk Waqif. Holy freaking moley.

(I seem to have a bit of a weird-timing record with serendipitous moments like that, such as my car finally giving up the ghost (actually, its transmission, at 175,038 miles), while I was driving out to the dealer to buy my new car. Oh, and adding to the weirdness is that it was on my birthday.)

Anyway, that’s the story of my travels.

Oh, and back home: after fourteen hours in the air, at the airport there was no question about when I was going through the passport check and customs. I wanted to apologize to those in the non-U.S. citizens queue, given that it looked like they would be waiting for at least an hour to get through. So to all my non-U.S. citizen readers who landed at Dulles airport the afternoon of 12 January 2014, on behalf of the entire United States of America, I offer my most profound apologies.

The reading material of choice during the trip: Sway, Evolution of Useful Things, Code, and Don’t Put Me In, Coach. The last book was light reading material, and perfect for unwinding on the plane. If you ever spent time as a bench-warming basketball player (I’m looking in my own direction here), I highly recommend it.

Tests Result in Better Programs, and Programmers

It’s widely noted and accepted that tests result in better code. My conjecture is that tests result in vastly improved programmers.

Even with a good test suite, the first implementation of a project will be utilitarian, complying with the edict of “just get it working!”. This are the red and green phases of the red/green/refactor cycle: red is when the tests fail, and green is when they pass. Too often, that initial version is the end of the cycle for many programmers and projects, which is why, in general, version 1 of a program is horrible:

win101logo

In the refactoring phase the project goes from simply “working” to being well-crafted. Thus the programmer as well has changed their mindset from writing “working” code to writing well-crafted code, and the programmer has been elevated to a higher level of craftsmanship.

Refactoring code is when a programmer can develop their skills and push the boundaries of their knowledge, such as delving into advanced object-oriented programming and metaprogramming, which are often considered dauntingly complex and risky. But used properly, they can dramatically improve a project, and I believe what is more important, they can dramatically improve the programmer.

In my experience, this was first proven to me when I was working on a large C++ project, my module being our persistence layer, providing a J2EE-like (but vastly simpler) interface with PostgreSQL. Although in that era (the late 1990s) Test-Driven Development hadn’t become popular, my daily goal was to write more test code than “real” code, usually between 500 and 1500 lines per day, usually, but not always, writing my tests first.

As my test suite grew and I became more confident in its ability to catch errors, I pushed my knowledge of C++, particularly with templates and the Standard Template Library, eventually reaching a point when I really understood the magic of the STL code. Had I not had such tests, and thus confidence, I likely would have refrained from pushing myself into an area that previously was in the dark, murky area of C++ to me, and I would not have gained valuable expertise as a C++ programmer.

In fact, I’d say that it’s only because of tests that I’ve felt confident learning a new language, such as when I rewrote DiffJ in JRuby, a language I hadn’t used. Having a thorough test suite made the learning process much easier.

The bottom line is that programmers must understand that testing includes refactoring (including of the test code itself), and that the refactoring phase is where programmers, and projects, can become significantly better.

Oh, and with thorough refactoring that horrible version 1 can eventually lead to significant improvements:

macos

Shorter Directory Names in Emacs Ibuffer and Mode Line

I use Emacs and Z shell exclusively, and have been frustrated at times about the features in Z shell that are not in Emacs, one being the shortened directory names. These are used for navigating (changing directories and displayed in the prompt), and are especially useful with long/deep directory names.

In Java projects, this is beneficial, because the hierarchy tends to be so deep. For example, with the path:

   /home/me/proj/com/mycompany/projectx/trunk/src/main/java/com/mycompany/util/FooUtil.java

In Z shell the directory can be given a short name with:

   hash -d projectx=/home/me/proj/com/mycompany/projectx/trunk

So the file can be referred as:

   less ~projectx/src/main/java/com/mycompany/util/FooUtil.java

And this can be done more extensively than that, such as:

   hash -d pxutil=~projectx/src/main/java/com/mycompany/util

And used as:

   less ~pxutil/FooUtil.java

That also makes for a shorter directory name displayed in the prompt, such as this for DiffJ:

    (~diffj)-[master]-(0)%

Which without Zsh hashes would be:

    (/home/jpace/proj/org/incava/diffj)-[master]-(0)%

So I’ve become used to that in my shell, but have been frustrated with my editor, which shows the full path in the default ibuffer display, running out to over 100 columns:

    TokenList.java   1699 /home/jpace/proj/org/incava/diffj/src/main/java/org/incava/diffj/code/TokenList.java

It also seemed redundant that in the above, TokenList.java is displayed twice, so the filename-and-process column could be replaced with only the directory name.

I looked around for examples about changing the columns in ibuffer, but found few examples, mainly the same code repeated on various sites online, variations of this from http://www.emacswiki.org/emacs/IbufferMode:

;; Use human readable Size column instead of original one
(define-ibuffer-column size-h
  (:name "Size" :inline t)
  (cond
   ((> (buffer-size) 1000000) (format "%7.3fM" (/ (buffer-size) 1000000.0)))
   ((> (buffer-size) 1000) (format "%7.3fk" (/ (buffer-size) 1000.0)))
   (t (format "%8d" (buffer-size)))))

That was a starting point, but I couldn’t find anything more about revising the filename-and-process column, which shows the above long path.

The following code adds a ‘dirname’ format to ibuffer to shorten the filename and show only the directory:

(defvar jep:filename-subs
  '(("/home/jpace" . "~")
    (".*/Projects/com/mycompany/is/" . "~is/")
    ("/home/jpace/proj/org/incava/" . "~incava/")
    ("/$" . "")))

(define-ibuffer-column dirname
  (:name "Directory"
	 :inline nil)
  (if (buffer-file-name buffer)
      (str-replace-all (file-name-directory (buffer-file-name buffer)) jep:filename-subs)
    (or dired-directory
	"")))

(setq ibuffer-formats
      '((mark modified read-only " "
	      (name 30 30 :left :elide)
	      " "
	      (size 9 -1 :right)
	      " " dirname)
	(mark modified read-only " "
	      (name 30 30 :left :elide)
	      " "
	      (size 9 -1 :right)
	      " " filename-and-process)
	(mark " "
	      (name 30 30 :left :elide)
	      " " filename-and-process)))

Now my ibuffer looks like:

[ diffj ]
    TokenList.java    1699 ~incava/diffj/src/main/java/org/incava/diffj/code
    build.gradle       892 ~incava/diffj
    etc....

Much better.

Coincidentally, I had the same complaint about the modeline, which defaults to the same display as ibuffer, with the full path of the file. So I tweaked the modeline code to do the same:

(defvar jep:modeline-subs
  '(("/home/jpace/" . "~/")
    (".*/Projects/com/mycompany/is/" . "~is/")
    ("/proj/org/incava/" . "~incava/")
    ("/$" . "")
    ))

(defun jep:modeline-dir-abbrev ()
  (str-replace-all default-directory jep:modeline-subs))

(setq default-mode-line-format
      (list ""
            'mode-line-modified
            "%25b--"
            " ["
            '(:eval (jep:modeline-dir-abbrev))
            "] "
            "%[("
            'mode-name
            "%n"
            'mode-line-process
            ")%]--"
             "L%l--"
             "C%c--"
            '(-3 . "%P")
            "-%-"))

Note that the modeline and ibuffer code above uses this function, which I put in ~/.emacs.d/lisp/str.el:

(defun str-replace-all (str pats)
  (if (null pats)
      str
    (let* ((pat (car pats))
	   (lhs (car pat))
	   (rhs (cdr pat)))
      (replace-regexp-in-string lhs rhs (str-replace-all str (cdr pats))))))

So there is my contribution. Here it is in action:

emacs_dirname

And it’s included with my Emacs configuration on GitHub.

Extended Colors with Git

It’s well-known that Git supports colors for many operations, such as diff, and that these colors are customizable. Examples online show how to modify ~/.gitconfig to set various fields for git functions, such as this for the “diff” command:

    [color "diff"]
        meta = yellow bold
        frag = magenta bold
        old = red bold
        new = green bold

All other examples that I saw online used the same color names, that is, the ANSI colors for terminals: black, red, green, yellow, blue, magenta, cyan, white.

What I wondered, after rewriting Glark so that extended colors could be used for highlighting matches, since terminals now support those colors, is whether Git supported extended colors. Digging around through the Git source code shows that a color, in addition to being one of the above color names, can also be a number between 0 and 255, per the ANSI escape codes.

The nit is that that’s not an RGB value; it’s a ANSI code that corresponds to a color, and for people accustomed to RGB, the ANSI code is dissimilar enough to be confusing.

An RGB value can be mapped to a ANSI code with a simple equation (this is from the Rainbow Ruby Gem):

def to_code red, green, blue 
  r, g, b = [ red, green, value ].map { |v| (6 * v / 256.0).to_i }
  16 + 36 * r + 6 * g + b
end

Each of the values for red, green and blue is scaled to between 0 and 5, then offset for the ANSI version of RGB.

This Ruby snippet dumps the list of colors as foregrounds:

(0 .. 255).each do |c|
  puts if c > 0 && (c % 10) == 0
  printf "\e[38;5;#{c}mabc %3d\e[0m  ", c
end
puts

The colors, as foregrounds on white:

fg_on_white

As foregrounds on black:

fg_on_black

As backgrounds on white:

bg_on_white

And as backgrounds on black:

bg_on_black

Also note that the Git color fields are of the format “[attributes] foreground [background]“, where if a second color is given, then it is used as the background.

The same is true for ANSI codes, so that the second ANSI code specified will be used as the background.

Attributes are the following: bold, blink, ul (for underline), reverse, and dim. More than one can be used.

Not that I recommend it, but a valid configuration could thus be:

    [color "diff"]
        meta = bold 190 22
        frag = blink 189 89
        old = blink bold 160 143
        new = reverse bold blink ul 52 227

Which looks like the following:

example_diff

Again, I don’t recommend it. I’ll post an update when I’ve settled on a color theme.

Review of LinuxMint 14 KDE

I finally upgraded my main personal machine to Linux Mint KDE 14. That machine was running Mint 11, for around two years, but when I began working with Scala, I discovered that Emacs 24 is much better for Scala support. Not finding Emacs 24 in the Mint 11 repositories, I finally took the time and effort to upgrade.

Now I wonder why I’d waited so long.

The first immediate improvement, albeit superficial, was that KDE uses blue as its primary color. It makes sense that that Mint would of course choose, well, a minty green as its color, but that is one of my least favorite colors, reminding me of a 1970s refrigerator. I didn’t care for the brown of Ubuntu, and missed the blue of Fedora, so now I’m back, in a way.

I switched to the Oxygen theme, which is nicely dark, mostly dark greys. The other themes I looked at seemed to be excessively noisy, and I like a minimal desktop experience, with no peripheral distractions.

As I complained before, when I temporarily switched from Gnome to KDE (and back again) in KDE the fonts look, in a word, horrible. Absolutely horrible, if you will permit me two words.

This time I googled around a bit, and found this thread.

The summary of that is to go to the Fonts settings and set them all to Ubuntu 10 Regular, except for Fixed Width (Ubuntu Mono 12), Small (Ubuntu 9), and Windows title (Ubuntu 10 bold).

Set the following:

  • Anti-aliasing: enabled
  • Exclude range: unchecked
  • Sub-pixel rendering: RGB
  • Hinting style: slight

Install Windows fonts via: “sudo apt-get install ttf-mscorefonts-installer”. The command line app is necessary because you’ll need to accept the license agreement, which has no equivalent for the GUI-based package managers.

That has made a huge difference in the appearance. I do not understand why these would not be the default settings in KDE, so mark that as one advantage in the favor of Gnome. One advantage. I haven’t found a second one.

I also installed the Inconsolata font (the package “ttf-inconsolata”), which I tried out with Emacs after reading about it as being highly recommended. After a while I went with (back to, actually) DejaVu Sans Mono, font size 9, since I found Inconsolata characters to be too wide.

The KDE UI takes a little while to get used to, especially seeing all apps in the panel, not just the ones for the current workspace. I also set the shortcut for the start menu to alt-F1, after trying to re-map the Windows key, with a modicum of success.

This upgrade makes me feel like I’m back in my early Red Hat / Fedora days, with the UI clean and responsive. I haven’t yet tried out activities under KDE, but I am planning to.

On that note, being a KDE neophyte, I’m looking for a KDE book, and would appreciate any recommendations.

Recommended Books

This is a list of what I consider the best books relevant to programming, and perhaps, even to life.

Two of the older books, which shaped me during my C++ days, are Large Scale Software Design (by John Lakos) and Object-Oriented Design Heuristics (by Arthur Riel). The former is very relevant to Java projects, perhaps even more so now, since by its nature, Java makes it unclear as to the package hierarchy. That is, java.io and java.util are intertwined, with mutual dependencies. Having a clear package/module hierarchy can make it easier to understand the levels within a project, and how their behavior aggregates up through those levels.

Object-Oriented Design Heuristics is simply a must-read for anyone doing OO programming, which means essentially every programmer. Maybe even the functional programmers. It’s been a long time since I’ve read it, but one salient point that I remember is that managers are discouraged. That is, nonsense such as ResourcePoolManager, which often (usually? always?) act as god classes, and break the OO design by centralizing behavior instead of having it tightly integrated with the client classes. The book goes extensively into inheritance, and how is it abused and distorted, where classes within a hierarchy often break down in terms of being properly decoupled.

If you program, you must, absolutely must, understand regular expressions. I cannot fathom how anyone doing text-based programming (which also means essentially everyone) could not use regular expressions. Every day. Every hour. Much of the magic of dealing with huge code bases is simply mastering regular expressions. So the must-read book on this topic, by another Jeff, is Mastering Regular Expressions, by Jeffrey Friedl. It’s another book that if you read through chapter 3, you’ll understanding 80% of regular expressions, an application of the 80/20 rule, AKA the Pareto principle.

For Scrum, the best book is Agile Software Development with Scrum, by Ken Schwaber and Mike Beedle. Unlike those released by the book-by-the-pound publishers, this one is concise and direct, clocking in at around 150 pages. As with the book above, reading just three chapters will give you 80% of the full understanding of Scrum.

Hackers and Painters, by Paul Graham, is another must-read, probably the best book I’ve read about the mindset of great programmers, and creators/artists in general. It makes clear that programming is more art than science, and more painting than engineering, contrary to the roots, and biases, of this field. It’s so good that I recommend it not only to programmers, but to those fortunate people who have a programmer in their lives, and want to understand their mental processes.

Delving into non-programming books, I suggest The Fifth Discipline, by Peter Senge. It’s more about businesses than about software, but one point that programmers should appreciate is that there are systemic behaviors, or accidental behaviors, as Fred Brooks might say, resulting from the organization of a system, such as a business, but also programming teams, and even the design of their code. Trying to work around an inappropriately designed system can be extremely frustrating, even to the point of futility.

This might be an odd choice, but I’m going to shoehorn it in here anyway, because I like it so much: The Gentle Art of Verbal Self-Defense, by Suzette Elgin. It discusses patterns in communication, such as loaded questions, those with invalid premises (“If you really cared about getting this feature done, you wouldn’t be wasting time refining the build process”). Especially in this era where multi-cultural teams are the norm, it is imperative that people understand the deeper significance of the words they choose. It is somewhat like The Fifth Discipline, such that communication itself results in systemic behaviors and dynamics within a team.

I don’t think it’s too controversial to say that programmers are gifted, and by that I don’t mean that they are superior to others – just different. Very different. As Paul Graham elucidates in Hackers and Painters, there is simply a different internal mechanism among programmers, which results in their (illogical, to some) obsession with “minor details”, that is, the core of their work. (Programming is mostly just minor details, aggregated into huge systems.) Programmers also tend to be highly sensitive, as could be expected of people who have to closely watch for even slight variations in behavior or performance. So I suggest The Gifted Adult, by Mary-Elaine Jacobsen, which shows insights into how those people think and behave, and why their levels of concentration can be easy disrupted. I especially think that software managers should read it, to understand why the cats that they are trying to herd behave so much unlike “normal people”.

Wrapping up my recommendations is another really odd choice: The Path Between the Seas, by David McCullough. It’s about the building of the Panama Canal, about how after years of failed attempts, the canal succeeding in being built by two things: building railroads (to move workers and to remove dirt) and eliminating disease (yellow fever). John Stevens, the head engineer, devoted a significant amount of time (over a year, as I recall) building railroads instead of digging. So once the digging phase began, resources could be moved much more quickly, and with dirt removed, it reduced the exposure to mudslides from the frequent heavy rains. Dr. William Gorgas also solved a resource problem by eliminating disease, thus keeping workers productive. How this fits with software is that much of our work isn’t the digging per se; we have a one-off, with the system around the project itself, such as resources, both material and human.

Glark 1.10.0

Glark 1.10.0 is ready to be released, after a couple of years of being not a high priority for me. I was inspired to rewrite it when I looked through the code, much of it written early in my Ruby days. It began as a Perl script, and retained that scriptitude through its life.

At times while rewriting Glark, I wish that I’d blogged that experience. The short description is that I had a few primary guidelines:

  • Test as thorough as possible, ideally one test (at least) per feature. A feature is essentially the same as an option. Each source file/class should have an equivalent test case.
  • Keep files and classes small, and relatively even in size.
  • Simplify the set of options.
  • Eliminate global variables.
  • Add one feature per day.

Following these principles resulted in a code base I am much more satisfied with.

Previously much, if not most, of Glark was “field-tested”, a euphemism for “I tried it out a while ago, and I think it worked then.” As the test suite grew, the code became much easier to refactor with confidence.

Regarding the size of files and classes, I used a simple metric:

% wc lib/**/*.rb | sort -rn

And then I usually tackled what was at the bottom of the list.

The average file is now 59 lines long, with the largest being 201 lines, and the smallest, 10 lines. In the previous implementation, the smallest file was 102 lines, the largest, 761 lines, and the average, 328 lines.

Option processing was the major chunk of code tangled through the code base, primarily because there was a single Options class, a singleton used essentially everywhere throughout the code. I first split that into the subsets of options, such as those for the input options, for matching, and for output, with their equivalent submodules using only those option objects instead of the global/singleton.

This was further cleaned up by removing the option processing from what I eventually labeled their “specs”, the values that determined the behavior of the submodule. One idea that I had is that eventually those specs could be passed in from outside of Glark itself, for usage by external programs, such as PVN.

Adding one feature per day, which I’ve written about previously, motivated me to do some non-coding things, mainly writing documentation. I’ve realized that much of the distinguishing functionality of Glark hasn’t been well documented, and the man page has now increased from 927 lines to 1126.

It’s been an interesting evolution of the behavior of Glark, as much of its early functionality has been added into grep, such as colorized matches and context around them. Elaborating on what I wrote in the readme/man page:

Glark extends grep by matching complex expressions, such as “and”, “or”, and “xor”. This is useful in a case such as “I’m looking for “foo” and “bar”, within three lines from each other.” It can be infinitely complex, such as, also from the man page: “glark –and=5 Regexp –xor parse –and=3 boundary quote”, meaning: (within 5 lines of each other: (/Regexp/ and (/parse/ xor (within 3 lines of each other: /boundary/ and /quote/))))

Glark handles file, directory, and path arguments, optionally recursing directories to a certain depth, and processing path arguments as a set of files and directories. .svn and .git subdirectories are automatically excluded.

I realize, with a bit of guilt, that that defies the Unix principle of keeping programs small, and with minimal overlapping functionality, since much of that is already done by the “find” command. However, some of this behavior was included in early Glark, and grep itself has the “-r” option for recursing directories, so I wanted to extend that to be more advanced, in part because when running Glark on Windows systems, there is no “find” command.

Binary files are excluded (by default), but can, in the case of compressed or archived files, have their extracted contents be searched.

This rolls into Glark behavior that I’d wanted for a while, mainly for searching Jar files for class names, which I previously did via shell scripting, such as:

for i in *.jar; do jar tf $i | glark --label=$i Exception; done

That now can be done with:

glark --binary-files=list Exception *.jar

Glark can use a per-project configuration file, so different projects can have their own Glark parameters, such as different files to include and exclude for searching, and different colors for pattern highlighting. My goal there is that it can add feedback for when one is working in different projects, such as highlighting matches in Java code differently than in Ruby. Colorizing is still
only on a per-project basis, not on the file type itself, which I’m considering adding, since it might be helpful to distinguish matches in Ant build files from those in Gradle.

I am doing some final testing, and then the Ruby Gem should be available. I am looking for maintainers to repackage Glark as an RPM and a Debian package, although I will probably release unofficial packages for those within a few days as well.

Stati of Projects

I’ve been jumping around from one project to another on an as-needed basis, and here is the current status of each.

RIEL – I’ve been updating the colorizing code, adding the functionality to use extended color codes on ANSI terminals, instead of the default 10.

Glark – In the midst of a major rewrite, for both code purity and functionality. The main changes so far are extensions to the path/file arguments, bringing in some functionality from find. Glark will also (with the imminent integration of RIEL) support extended colors.

PVN – This project was in heavy development until a month ago, when it went into a state of waiting for updates to Glark, since it will be using Glark for its seek (searching) subcommand.

DiffJ – This project I rewrote in JRuby, but that was a little too slow for a command-line application –  even heavily tweaked, I couldn’t get the startup time under two seconds. So I re-rewrote it in Java, and intend to revisit it to add more intelligent code comparisons, such as understanding that “if (a == b) foo();” is the same as “if (a == b) { foo(); }”

IJDK – Mostly dormant at this point. When I’ve rewritten Java projects, I’ve tried to extract the generic code from them and add them to IJDK, but I haven’t been heavily involved with any of my Java projects lately.

Java-Diff – A while ago this was brought up to date to use generics, and was refactored for code clarity. There hasn’t been any reason to update it since then.

DoctorJ – Alas, this is dormant. Some of its warnings have been integrated into the Java compiler, such as mismatched parameter names. It still goes beyond that level of pedanticalness, so it is most suitable in the development of projects where documentation is paramount, such as APIs.

Related posts:

Tests Result in Better Coders

We know that tests result in better code. They also result in better programmers.

The first implementation of a project will often be utilitarian, complying with the edict of “just get it working”. Code will be written and tweaked until the tests pass, but there are few cycles, if any, devoted to refining the code to be higher quality. This the red/green/refactor cycle is limited to just red and green.

The refactoring phase is where the code goes from simply working to being well-crafted. As the mindset of the programmer changes from being focused on utility to being focused on quality, they program accordingly different.

With the principle of quality being foremost, and with the confidence from having a thorough set of tests, the programmer can also push the boundaries of their knowledge, such as delving into advanced object-oriented programming and metaprogramming, which can be dauntingly complex and risky. Those areas do not necessary have an immediate impact on functionality, so when a programmer is in functionality/utility mode, they’re less likely to employ those techniques. However, in quality mode, a programmer will feel more justified, and confident, in using those techniques, which over the long term can cause a dramatic improvement to a project.

Often forgotten is that both code and tests should be refactored. Test code that is unclear, misleading, or just wrong can be very frustrating to someone trying to understand a body of code, since the tests are the best starting point.

The bottom line is that programmers must understand that testing includes refactoring (including of the test code itself), and that the refactoring phase is where programmers, and projects, can become vastly better.

Related posts: