Writing a (Ruby) compiler in Ruby bottom up

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Eigenclasses

Eigenclasses, or meta-classes in Ruby are effectively regular classes that gets "injected" in the inheritance chain, but hidden when calling #super and #class (Ruby is sneaky like that; I'm not convinced I like that part). For more on eigenclasses there's _why's excellent old article

Fixing this is deceptively easy, at least for the basic case. Lets get this example working first:


    class Foo
    
      def self.bar baz
        puts baz
      end
    
    end
    
    Foo.bar("Hello world")

def self.bar basically means (using the convention that "#klass" refers to the real class of the object, rather than whatever #class returns):

"if self.klass is not an eigenclass, then make it one (create a new subclass of self.klass that is marked as an eigenclass). Define #bar as a method on self.klass". Note that "self" in this case is the Class object Foo, so Foo.klass will be Class, and the new eigenclass that we define a method on will be a subclass of Class, not a subclass of Foo.

Similarly, the alternative syntax (which we'll get back to later) of "class <

(where green represents Class objects)

This represents this:


    class Foo
    end
    
    Foo.new

Now we want to do the def self.bar from above. The resulting objects should look like this:

For illustration: The non-class case

Consider if we instead did:


        class Foo
        end
        
        ob = Foo.new
        
        def ob.bar
        end

In that case, we'd expect #bar to get defined on an Eigenclass that pops into the hierarchy like this:

Notice how the Eigenclass for the non-class case gets introduced as the new class of the instance. The object isn't "really" an instance of Foo any more, but an instance of the eigenclass, which again is inherited from Foo.

Conceptually it's exactly the same. It's just a matter of which object we introduced a new method on, and therefore which object we attached an eigenclass to.

We'll not deal with this case any further in this part. We'll wrap that up later. Especially as instance specific eigenclasses is rare (because they should be used sparingly: they are expensive, causing a Class object per object you attach methods to).

Implementation

First we get 51c003f out of the way, which simply expands the Scope class slightly to forward and provide defaults for more cases. Most of these will be made use of later.

Then let's start to hook in the actual eigenclass treatment. We're now aiming for handle the def self.foo case, which the parser will deliver as [:defm, [:self, :foo], ...].

All of the following is in 8ad4e6f


       def compile_defm(scope, name, args, body)
         scope = scope.class_scope
     
    +    if name.is_a?(Array)
    +      compile_eigenclass(scope, name[0], [[:defm, name[1], args, body]])
    +      return Value.new([:subexpr])
    +    end

compile_eigenclass looks like this:


    +  def compile_eigenclass(scope, expr, exps)
    +    @e.comment("=== Eigenclass start")
    +
    +    ob = [:index, expr, 0]
    +    ret = compile_eval_arg(scope, [:assign, ob,
    +                                   [:sexp, [:call, :__new_class_object, [scope.klass_size, ob, scope.klass_size]]]
    +                                  ])
    +    @e.save_result(ret)
    +
    +    let(scope,:self) do |lscope|
    +      @e.save_to_local_var(:eax, 1)
    +
    +      # FIXME: This uses lexical scoping, which will be wrong in some contexts.
    +      compile_exp(lscope, [:sexp, [:assign, [:index, :self ,2], "<#{scope.local_name.to_s} eigenclass>"]])
    +
    +      exps.each do |e|
    +        compile_do(lscope, e)
    +      end
    +      @e.load_local_var(1)
    +    end
    +    @e.comment("=== Eigenclass end")
    +
    +    return Value.new([:subexpr], :object)
    +  end
    +

Basically, we create a new class object, save the pointer to it as a local variable temporarily, and alias self. We then assign a name to the class, and compile methods in that new context.

This is all fairly similar to compile_class, and there might be opportunities to combine the two more later (and there's almost certainly holes/bugs in compile_eigenclass as it stands.

There's also one big inefficiency: If you define multiple methods, you get multiple eigenclasses chained. That's ok. It's just wasteful, and potentially slow. It's easier to fix that once I get around to adding an easier way of identifying an eigenclass.

But what is the let method above?

A convenience method for local variables

Basically, I needed a simple way of defining a local scope, and allocating stack space for it. compile_let did that, but only for code in tree form. So the new let is a rewrite that extracts out that part of the code, and leaves compile_let a tiny little stub:


      def let(scope,*varlist)
        vars = Hash[*(varlist.zip(1..varlist.size)).flatten]
        lscope =LocalVarScope.new(vars, scope)
        if varlist.size > 0
          @e.evict_regs_for(varlist)
          @e.with_local(vars.size) do
            yield(lscope)
          end
          @e.evict_regs_for(varlist)
        else
          yield(lscope)
        end
      end
    
      
      # Compiles a let expression.
      # Takes the current scope, a list of variablenames as well as a list of arguments. 
      def compile_let(scope, varlist, *args)
        let(scope, *varlist) do |ls|
          compile_do(ls, *args)
        end
        return Value.new([:subexpr])
      end

Before we move on to some minor supporting changes in the runtime library, there was one bug that slowed me down substantially I want to briefly mention:

compile_assign was the only user of what at some point became a severely broken mechanism for saving register content. With the proper register allocation there's no excuse for it any more, and so Emitter#save_register has to go. The problem was that if you indicated a register should be saved, it didn't properly mark it as freed up again, and so we got spurious pushl's onto the stack in positions which meant you could push and pop values in the wrong order The replacement in `compile_assign is sometimes less efficient, but also less broken.

It also cuts some lines out of emitter.rb

Runtime library changes

The bootstrapping of the object model is still a bit messy. One aspect that has been there "forever" but which became more obvious when working on this part, is the chicken and egg problem with Class and Object. With the support for re-opening classes, we have the mechanism for making the situation better.

This change in 1e1ce21 improved by explicitly "manually" link Class into the list of subclasses of Object. This means Class will properly inherit methods of Object after all, and let us clean up Class next.


     class Object
    +  # At this point we have a "fixup to make as part of bootstrapping:
    +  #
    +  #  Class was created *before* Object existed, which means it is not linked into the
    +  #  subclasses array. As a result, unless we do this, Class will not inherit methods
    +  #  that are subsquently added to Object below. This *must* be the first thing to happen
    +  #  in Object, before defining any methods etc:
    +  #
    +  %s(assign (index self 4) Class)
    +

The above let us strip out the suprfluous Class#== and depend on Object#== in 4c0c349. At the same time (and same commit) we fix Class#to_s and Class#inspect to use the ugly Class#name. We also "manually" set the class of Class to Class, and it's name to the raw string "Class"

Conclusion

These changes leave us with what's needed to compile almost all of the tokenizer code pretty much unmodified, and so we're no rapidly closing on the initial goal of compiling the s-exp subset of the parser. We'll look at the next batch of changes for that next.

However I will make a change going forwards. After part 44 or 45, rather than batching up changes and trying to cover specific subjects, I'll be pushing out changes as soon as I have something ready to commit, and will mention the commits I think are worth while discussing in much shorter posts. I'll then follow it up with longer articles covering specific areas of the compiler or larger changes more rarely as I think I have something more worthwhile about a specific component.

The main reason is that it takes a lot of effort to write these larger posts - 2-3 times as long as actually making the changes (including time to restructure some of the commits to make more sense for the articles), and frankly I want to make faster progress on actually being able to use the compiler.

This will also simplify my git handling substantially - currently I manage two repos in addition to my local working copies - one for my drafts, and the public one on Github. Going forwards I'll push changes to the Github repo right away. Branches will also no longer reflect a specific article.

I hope this will lead to more interesting development rather than fewer, though recent parts of this series have been very much a mixed bag of commits that are only related by the what goal I've been working towards rather than what code they've touched anyway.

Writing a (Ruby) compiler in Ruby bottom up - step 43 2015-02-02