Improving Performance of DiffJ/JRuby

DiffJ is nearly ready for release, but I’ve not been content with the performance, which is significantly slower with the JRuby implementation than the pure Java version.

My changes were based on the recommendations on the JRuby wiki.

Before any optimization, a test run of diffj against a pair of Java files ran with the times:

user    :   4.84
system  :   0.19
cpu     : 184.20
total   :   2.72

Following the suggested changes, I added the -client argument to the Java process, which resulted in:

user    :   4.99
system  :   0.21
cpu     : 184.80
total   :   2.80

So performance actually worsened.

Next was passing the argument -Djruby.compile.mode=OFF arguments to the Java process:

user    :   4.92
system  :   0.17
cpu     : 184.80
total   :   2.75

Again, performance worsened slightly.

With both the -client and -Djruby.compile.mode=OFF arguments, performance was still down, as one would expect now:

user    :   4.92
system  :   0.18
cpu     : 186.00
total   :   2.73

So then I more carefully went through my code, looking at the time to require each file, and found two salient problems.

The first is that I was dynamically creating several hundred methods in the RIEL ANSIColor class, for each combination of decorations and foreground and background colors (such as “bold_red_on_white”). I refined that code, the RIEL Log and Loggable classes, and the extensions to the Ruby String class to dynamically create the color methods as necessary.

Thus the dynamic definition of the method “bold” is truly a usage of the decorator pattern.

That resulted in a slight improvement:

user    :   4.68
system  :   0.22
cpu     : 186.80
total   :   2.62

I wondered about the overhead of Rubygems, so I removed RIEL as a gem, and instead added it within the DiffJ source tree. Performance improved significantly:

user    :   3.74
system  :   0.20
cpu     : 184.80
total   :   2.13

Combining all of the above resulted in the best performance:

user    :   3.62
system  :   0.19
cpu     : 181.20
total   :   2.10

That is acceptable to me, and I’ll be releasing version 1.3.0 of DiffJ soon. Of course, it’s on Github here, so feel free to download it and build it.

sort{ |a, b| a <=> b }.ing with Ruby

One idiom from Perl that I’ve missed with Ruby is the ability to chain comparisons together, such as:

my @a = qw{ this is a test };

$, = ", ";

print sort { substr($a, -1) cmp substr($b, -1) || length($a) <=> length($b) } @a;
print "\n";

Which results in the output:

a, is, this, test

In Ruby, it’s a little more complicated, since Perl evaluates a zero as false, but Ruby does not. However, the nonzero? method for all Ruby Numeric objects essentially performs this conversion, for use in a boolean evaluation, returning nil if it is zero, and the number otherwise. So in Ruby, the above code would be:

a = %w{ this is a test }

puts a.sort { |a, b| (a[-1] <=> b[-1]).nonzero? || a.length <=> b.length }.join(", ")

One additional note: if you’re using this in a spaceship method (“”) for the Comparable interface, remember that it must return a numeric value, so if you chain evaluations together, the final statement should be zero, since all previous evaluations were nil (meaning that they were equal). This bit me during some recent DiffJ work, and here is an example of a corrected method:

class Java::net.sourceforge.pmd.ast::Token
  include Comparable, Loggable

  # ...

  def <=> other
    (kind <=> other.kind).nonzero? ||
      (image <=> other.image).nonzero? ||

That’s DiffJ opening the PMD token Java class and adding the Ruby Comparable interface to it, so tokens can be sorted in Ruby collections.

On that note, DiffJ is in rough beta status now. I’m using it for my work (refactoring and cleaning up legacy Java code), and just corrected a glitch in the Token code, ironically enough, for supporting usage in Hash objects. I’d neglected to implement the eql? method, erroneously thinking that Hash uses the Comparable code.

With that fix, the JRuby implementation of DiffJ produces the same output as the Java implementation. It’s somewhat slower, so I’ve been investigating AOT compiling of it, but that doesn’t seem to have much of an effect.

I just realized that another feature from Perl that I’ve missed (and until writing that code above, hadn’t used for 10 years) is defining the array separator with the “$,” variable. Similar to that, my RIEL library modifies the to_s method of an Array to output “, ” between elements for output, since the default is to have no space between elements.

JRuby Issue with Regexp.last_match

In the DoctorJ project, I’ve been rewriting the Javadoc parser, and did the initial rewrite in Ruby. That was straightforward, and I then began migrating that to JRuby, with the idea that the code could gradually morph from Ruby to Java, via JRuby, such as regular expressions being reimplemented as java.regex.Pattern instead of Regexp.

The first step of the Ruby to JRuby transition was simply to change the shebang line to “#!/usr/bin/jruby”. However, there were test failures, and finding the source of those failures was difficult because the tests were so high level, meaning that the parsed Javadoc is what was tested, not the results of processing the Java comment with the Javadoc regular expression.

Eventually it became clear that the issue was with JRuby itself, not with my code. The JRuby code is very clear to understand and is formatted very well, and it closely matches the C source code of Ruby itself, making issues even easier to diagnose.

In this case, the issue was with the RubyRegexp class in JRuby, which, when setting the value that Regexp.last_match will return, has a reference to the region (capture/group) for the current match. However, that reference is to a “live” object (as opposed to an immutable one), and subsequent matches for that regular expression will update the region object, so the first Matchdata returned by Regexp.last_match will have captures that are the same as the latest match.

Here is a RubySpec that describes this issue:

describe "JRUBY-6141: Matchdata#captures" do
  before :all do
    "first, last".scan('(first|last)')) do
      @firstmatch ||= Regexp.last_match
    @lastmatch = Regexp.last_match

  it "returns first value from Regexp.last_match after all String#scan iterations" do
    @firstmatch.captures[0].should == "first"
  it "returns last value from Regexp.last_match after all String#scan iterations" do
    @lastmatch.captures[0].should == "last"

The solution for this issue is that the Region object should be cloned for setting the Regexp.last_match reference.

This issue was submitted as JRUBY-6141. JRuby uses RubySpec, so I provided the above test, as well as a Git patch, which were committed to the JRuby source.

Getting Started with JRuby

I was asked to summarize my experiences with starting up a project with JRuby, so I’ll do so here.

My experience with JRuby is only a couple of months, doing two projects, Armitage and Mackworth, two graphically-oriented psychological testing programs.

Overall I have been very impressed with JRuby, which I’m finding to be an excellent way to leverage the cleanliness of Ruby code with Java libraries, in my case, Swing, in the above two projects. I am also working on prototyping the new Javadoc processor for DoctorJ, writing it with a regular expression in Ruby, which I will migrate over to pure Java, but one chunk of code at a time via JRuby. My earlier rewrite went poorly, since without using JRuby as a bridge, I had to convert Ruby code straight to Java, which was too drastic of a change to go well. And it didn’t.

JRuby is so much like Ruby that the differences are surprising. As I wrote before, Ruby gems are not JRuby gems. Java threads behave poorly with Ruby threads, and I found it easier to use only Java threads.

In Swing, there are issues setting a JFrame as full-screen via the extended state and undecorated attributes, such as (in a subclass of JFrame):

    set_extended_state JFrame::MAXIMIZED_BOTH
    set_undecorated true

In Linux, this works properly in Java 1.6, but not in Java 1.5. In Windows, it does not work correctly with neither Java 1.5 nor 1.6.

Distributing a JRuby program is easy, although I wish (as with many programming languages and environments) that an installer was more integrated with the language. The best solution I found was to jar the complete JRuby jarfile with my *.class and *.rb files, with the manifest (META-INF/MANIFEST.MF) contents as “Main-Class: MyAppMain”.

MyAppMain is the JRuby equivalent of the Java main class, annotated as a Java class, such as:

    class MackworthTestMain
      java_signature 'void main(String[])'
      def self.main args

Building a JRuby application is easy via Rake, and my Rakefile for Mackworth can be found here. In this file is the code common to it and the Rakefile for Armitage.

JRuby and Rake

In my quest to write nothing but Ruby code in my copious free time, I’ve begun learning JRuby, which I’m finding an ideal solution to my Java projects, such as DiffJ and DoctorJ, which are dependent on Java because they use the parser from PMD.

Likewise I am migrating my build code away from Ant, using Gradle for the soon-to-be-released DiffJ 1.2.0. Thus I began trying to use the Ant library for Rake via JRuby, but was confounded by an error for the simple Rakefile:

require 'ant'

task :greet do
  puts "hello"
% jruby -S rake /usr/bin/rake:27:in `require': no such file to load -- rake (LoadError)
	from /usr/bin/rake:27

The fix was simple. Changing my mindset was more complicated: JRuby is looking for Ruby Gems in its own directory hierarchy, not the Ruby hierarchy.

So installing the Rake gem as follows fixed the problem:

% sudo jruby -S gem install rake