Tag: compiler

2010-06-12 21:21 UTC Writing a (Ruby) compiler in Ruby bottom up - step 25

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

First of all, here's the biggest reason I've been so excrutiatingly slow with getting this part together (at least that's my story, I guess I've had other things on my plate too...):

Tristan is 13 months now, and a real menace to my laptop (pulling off keys and drooling on the screen) and anything else within reach, and a real attention seeker...

So, whenever I'm slow at posting a new part, I'll blame him. I have no part in the delays at all, of course. None. I'm completely faultless...

Towards define_method via closures

For starters, here's the commit that covers most of this

To support define_method we need to support blocks with arguments. Really, full closures.

If you haven't already, you should read my post on closures

We want this:

    define_method(:foo) do |a,b|
       # Do something with a,b
    end

to be mostly equivalent to:

    def foo a,b
       # Do something with a,b
    end

For our attr_* implementation we actually want more:

    def attr_reader sym
      define_method sym do
        %s(ivar self sym) # FIXME: Create the "ivar" s-exp directive.
      end
    end

(or, as it turns out, not really that, as I realize the above can be simplified - if we have a lookup function to lookup the symbol to instance var slot, we can just use our index primitive to get the address; we'll return to that, but the above is the current state of attr_reader)

What that calls for is actually proper closures. One step harder.

There's the other issue that the above will be really inefficient. As in, requiring a hashtable lookup on every call inefficient, when we can statically determine the offset for any instance variable we know about at compile time. We'll get back to that at some point too, but if you've been following this series you know I care about getting things working first, efficient second.

For now attr_reader and friends still serve as a useful test case for basic closure support.

Baby steps

The first step towards closures is actually easy (we've already done it). We have our :lambda primitive that creates an anonymous function (which isn't really anonymous, it's just that the name is quitely created in the compiler and not accessible to the programmer).

Adding an environment

But an anonymous function is only of relatively limited use if you can't bind variables to it - it's not much more than a function pointer in that case.

So how do we let "sym" hang around, and how do we take advantage of that for define_method?

First, let us consider how you can call a lambda in Ruby:

  • yield
  • Proc#call

We could implement Proc#call via yield by passing the block as an argument to a method, if we wanted to, but that would probably be more inefficient than implementing both in terms of a primitive.

But this means those are the only things that has to explicitly know how to deal with the environment.

That gives us an interesting option. We could turn:

    c,d = somevalues
    do |a,b|
       ... uses c,d 
    end

into

    class SomePrivateUniqueClassName
        def initialize c,d
           @c,@d = c,d
        end

def call(a,b) ... code here with access to c,d rewritten end end

And we could have it inherit Proc. yield in that case is just rewritten to block.call

This does however have the troubling implication of messing with self, and as this irb session shows, we'd have to fix that:

>> lambda { p self }.call
main
=> nil
>> 

The other alternative is to create a lightweight environment binding of sorts. Oh wait, we already have that. It's called an object. The problem is not the class above as such, but creating a full class for each block. We could do something like this instead:

    class Proc
      def initialize & block
         # So Proc#initialize takes a block, but we're using this
         # to create blocks... Uh oh, I sense endless recursion.
         # We need to make it work so we can initialize a Proc
         # both from a block and from a raw function pointer
      end

def call *args %s(call @block_func (res args)) end end

Our code needs to be rewritten, so that blocks get wrapped in code to create the appropriate Proc object. Additionally, if the method uses variables from the surrounding scope, we need to alter the surrounding method and the block to refer to instance variables in this object, and if any of those variables are arguments to the surrounding scope, we need to copy them.

An example

Here's a simple example using a closure. Note the use of s-expressions here because we're now starting to get bitten by the changes we've done to turn Ruby code by default into method calls (the way it should be) without having put in the pre-requisite work to implement the number classes and Kernel methods (such as print to replace the libc printf). Ignore that for now.

    def mkcounter step
      cnt = 0
      lambda do
        %s(assign cnt (add cnt step))
        %s(printf "cnt: %d\n" cnt)
      end
    end

cnt = mkcounter(5) cnt.call puts "Calling again..." cnt.call puts "Done"

And here's the expected output:

cnt: 5
Calling again...
cnt: 10
Done

And here's the syntax tree after we've done the required transformations (more about that in a second):

    [:do,
     [:defm,
      :mkcounter,
      [:step],
      [:let,
       [:__env__, :__tmp_proc],
       [:sexp, [:assign, :__env__, [:call, :malloc, [8]]]],
       [:assign, [:index, :__env__, 1], :step],
       [:assign, [:index, :__env__, 0], 0],
       [:do,
        [:assign,
         :__tmp_proc,
         [:defun,
          ".L2",
          [:self, :__closure__, :__env__],
          [:let,
           [],
           [:sexp,
            [:assign,
             [:index, :__env__, 0],
             [:add, [:index, :__env__, 0], [:index, :__env__, 1]]]],
           [:sexp, [:printf, "cnt: %d\n", [:index, :__env__, 0]]]]]],
        [:sexp, [:call, :__new_proc, [:__tmp_proc, :__env__]]]]]],
     [:assign, :cnt, [:call, :mkcounter, 5]],
     [:callm, :cnt, :call],
     [:call, :puts, [[:sexp, [:call, :__get_string, :".L0"]]]],
     [:callm, :cnt, :call],
     [:call, :puts, [[:sexp, [:call, :__get_string, :".L1"]]]]]

That's a bit of a mouthful, so lets go through it the new bits

    [:let,
       [:__env__, :__tmp_proc],

This is added to hold a pointer to the environment we create to hold cnt and step, and to hold a pointer to the function we use to create the Proc respectively.

       [:sexp, [:assign, :__env__, [:call, :malloc, [8]]]],
       [:assign, [:index, :__env__, 1], :step],
       [:assign, [:index, :__env__, 0], 0],

Then we allocate the environment. We copy the argument, and from now on we use (index env 1) to refer to step. We then carry out the cnt = 0 statement. cnt and step are both moved into the environment because they are used inside the closure.

(You may have already picked up that we currently won't handle nested closures - lets leave some fun for later)

        [:assign,
         :__tmp_proc,
         [:defun,
          ".L2",
          [:self, :__closure__, :__env__],

Then we create a function and assign the address of the function to __tmp_proc. As you can see there are tree synthetic arguments to it:

self,__closure__ and __env__. The first is, as you'd expect, the object the method is called on. The second is to be used when passing blocks to a method, and the third is used to refer to the environment when inside.

(Something I just realized: We're begging for a name clash, if there's a closure defined inside that needs a separate environment. Obnoxious details; later)

            [:assign,
             [:index, :__env__, 0],
             [:add, [:index, :__env__, 0], [:index, :__env__, 1]]]],

And this monstrosity is what cnt = cnt + step was turned into.

Now, to make this work, we obviously need this __new_proc function, and a Proc#call that's sensible. Here's our current Proc:

    class Proc
      # FIXME: Add support for handling arguments (and blocks...)
      def call
        %s(call (index self 1) (self 0 (index self 2)))
      end
    end

# We can't create a Proc from a raw function directly, so we get # nasty. The advantage of this rather ugly method is that we # don't in any way expose a constructor that takes raw functions # to normal Ruby # %s(defun __new_proc (addr env) (let (p) # Assuming 3 pointers for the instance size. Need a better way for this (assign p (malloc 12)) (assign (index p 0) Proc) (assign (index p 1) addr) (assign (index p 2) env) p ))

Very simplistic and rather ugly, but it does the work. As you can see, the environment pointer is stored in the Proc object, and Proc#call hides the ugliness.

You may notice that this is inefficient - an extra level of indirection. Indeed it is - we can in theory make the environment part of the Proc object and save an indexed lookup, and there's a bunch of other tricks waiting. But again, that's optimizations we wont deal with now.

For now, lets move on to how to do the appropriate changes to get the example output above.

Some refactoring, and a few rewrites

Almost all the changes for this is in the Compiler class, and the code in question is now in the transform.rb file. There are some other small changes, but I don't think they need to be covered in detail.

Specifically, transform.rb represents the start of a bit of refactoring.

Some constructs in the Compiler class can be built easily on top of more primitive operations. lambda was an example of that.

Overall, a lot of things can be done by rewriting the parse tree. Doing it that was has the distinct advantage that it's possible to switch the various rewrites on/off to debug their effects, and to see the result long before it ends up as machine code.

It also helps isolate the CPU architecture specific parts further.

As a first step, transform.rb contains methods that solely rewrite the syntax tree. They are still currently part of the Compiler class, and do make use of the Emitter (specifically Emitter.get_local to get a unique label), but this can be changed later.

Unfortunately not all of these rewrites are small and simple.

In this round, we're left with three:

  • rewrite_strconst
  • rewrite_let_env
  • rewrite_lambda

rewrite_strconst used to be in compiler.rb and is mostly unchanged.

rewrite_lambda

rewrite_lambda replaces the lambda handling in compiler.rb with a rewriting step that looks like this:

    def rewrite_lambda(exp)
      exp.depth_first do |e|
        next :skip if e[0] == :sexp
        if e[0] == :lambda
          args = e[1] || []
          body = e[2] || nil
          e.clear
          e[0] = :do
          e[1] = [:assign, :__tmp_proc, 
            [:defun, @e.get_local,
              [:self,:__closure__,:__env__]+args,
              body]
          ]
          e[2] = [:sexp, [:call, :__new_proc, [:__tmp_proc, :__env__]]]
        end
      end
    end

What this rewrite does, is use our Array#depth_first method from extensions.rb to descend down the parse tree looking for :lambda nodes. It will skip any :sexp nodes, as they are used as "guards" to mark areas that should not be touched by rewrites.

When it finds a :lambda, it will replace the :lamda with this:

    %s(do
       (assign __tmp_proc
               (defun [result of @e.get_local
                 (self __closure__ __env__ [ + args])
                   [body]))
       (sexp call __new_proc (__tmp_proc __env__))
      )

Effectively defining the lambda (as we did before), but then calling __new_proc with the result. You'll note __tmp_proc is seemingly superfluous. It is in fact a workaround for a slight problem: The call needs to be within a sexp to not be rewritten to a callm (Ruby method call), but the defun can't be within a sexp, or other rewrites won't touch the body of the defun.

We'll clean that up later.

rewrite_let_env

Now for the hairy bit. Really hairy.

The goal of this one is to identify variables in the methods - this used to happen in the parser in a really simplistic way - combined with identifying :lambda nodes (and so this must run before rewrite_lambda) and lifting variables that are used inside the closure into an environment, and update the :let in the surrounding method definition to have a suitable environment.

It then also adds code to allocate the environment, as well as to shadow method arguments. Finally it rewrites accesses to variables that have been lifted into the environment, so that it uses %s(index __env__ offset) instead of the variable name.

Let's go through the code.

    def rewrite_let_env(exp)
      exp.depth_first(:defm) do |e|

We're hunting for :defm nodes only. Everything else will be ignored.

        vars,env= find_vars(e[3],[Set[*e[2]]],Set.new)

First step is to find a candidate set of variables. find_vars is another new method, and we'll go over that next. e[3] is the body of the method. find_vars is recursive, and we pass in the methods arguments (e[2]) as the starting "scope". Third we pass in an empty Set as the the starting point for the environment.

        vars -= Set[*e[2]].to_a

We remove the method arguments, as we'll use vars to create a :let node later on.

        if env.size > 0

If find_vars found any variables to make part of the environment, we need to:

           body = e[3]
           rewrite_env_vars(body, env.to_a)

Rewrite access to the members of the environment

           notargs = env - Set[*e[2]]
           aenv = env.to_a
           extra_assigns = (env - notargs).to_a.collect do |a|
             [:assign, [:index, :__env__, aenv.index(a)], a]
           end

Create assigns to copy all arguments that are used in the closures from the arguments on the stack into the environment

           e[3] = [[:sexp,[:assign, :__env__, [:call, :malloc,  [env.size * 4]]]]]

Allocate the environment

           e[3].concat(extra_assigns)
           e[3].concat(body)

Tack on the extra assigns and the body.

         end
         # Always adding __env__ here is a waste, but it saves us (for now)
         # to have to intelligently decide whether or not to reference __env__
         # in the rewrite_lambda method
         vars << :__env__
         vars << :__tmp_proc # Used in rewrite_lambda. Same caveats as for __env_

e[3] = [:let, vars,*e[3]]

Then we create a :let node with the new variable set, including __env__ and __tmp_proc which we've seen elsewhere.

         :skip
       end
     end

Finally we skip any nodes below the :defm so the depth_first call does not waste time.

find_vars

The purpose of find_vars is to identify variables that should be put in a let node as well as that should be part of a closure environment. This one is hairy, and a clear candidate for looking for ways to clean up later

def find_vars(e, scopes, env, in_lambda = false, in_assign = false)

    return [],env, false if !e

First of all, if we've been called with a nil expression, there are now variables to return, and the same environment passed in.

    e = [e] if !e.is_a?(Array)
    e.each do |n|
      if n.is_a?(Array)

An expression?

        if n[0] == :assign
          vars1, env1 = find_vars(n[1],     scopes + [Set.new],env, in_lambda, true)
          vars2, env2 = find_vars(n[2..-1], scopes + [Set.new],env, in_lambda)
          env = env1 + env2
          vars = vars1+vars2
          vars.each {|v| push_var(scopes,env,v) }

If it's an assignment, then we want to process both the left hand and right hand, but we need to mark the left hand since variables on the left hand introduce a new variable in the scope.

        elsif n[0] == :lambda
          vars, env = find_vars(n[2], scopes + [Set.new],env, true)
          n[2] = [:let, vars, *n[2]]

Special casing on :lambda because if we're handling a closure, and variables found inside the closure that were defined outside means that those variables needs to be lifted into the closure environment we're creating so that they're not allocated on the stack.

        else
          if    n[0] == :callm then sub = n[3..-1]
          elsif n[0] == :call  then sub = n[2..-1]
          else sub = n[1..-1]
          end
          vars, env = find_vars(sub, scopes, env, in_lambda)
          vars.each {|v| push_var(scopes,env,v) }
        end

Finally we special case on :callm and :call when handling the remainder, because they have arguments that are not sub-expressions to be considered when creating the closure environment.

      elsif n.is_a?(Symbol)
        sc = in_scopes(scopes,n)
        if sc.size == 0
          push_var(scopes,env,n) if in_assign

If n is a variable that does not exist in any of the scopes we keep track of, and it is not part of the environment, and it's on the left hand of an assign, we add it to the top scope.

NOTE: This current version is buggy. Not every variable potentially occuring on the left hand side should be added, just the one assigned to.

        elsif in_lambda
          sc.first.delete(n)
          env << n
        end

If it is in a scope and we're in a lambda, then the variable needs to be moved (deleted) from that scope and added to the closure environment.

      end
    end
    return scopes[-1].to_a, env
  end

Finally we return the innermost scope from this level and the environment

rewrite_env_vars

Finally let's look at rewrite_env_vars. When we've gathered together all the variables for the closure environment, we have an array, and any accesses to those variables needs to be rewritten to (index __env__ num). That's done like this, and I hope this one is simple enough not to go through line for line:

    def rewrite_env_vars(exp, env)
      exp.depth_first do |e|
        STDERR.puts e.inspect
        e.each_with_index do |ex, i|
          num = env.index(ex)
          if num
            e[i] = [:index, :__env__, num]
          end
        end
      end
    end

Next steps

We still haven't actually added support for define_method, but we now finally have most of the plumbing. That's the next step.

Then I want to start alternating between something more practical: Making the compiler compile early/simple versions of itself, and let it "eat its own tail" so to speak, and secondly to start refactoring and cleaning it up.



2010-02-22 17:35 UTC Writing a (Ruby) compiler in Ruby bottom up - step 24

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Apologies for the delay... This part was ready before christmas, but for various reasons I never got around to posting it. And to make matters worse I managed to post the wrong part yesterday. Sigh.

The Outer Scope

Stepping back from the attr_* debacle for a bit... I wanted to look at something else, mostly because it affects all my ad-hoc testing.

One of the downsides of starting this without a specific design in mind is that after I decided to go down the Ruby path we've been dragging along quite a bit of cruft.

But that is the cost of experimentation.

In particular, the compiler has a weird distinction between functions and methods: Define something with def outside of a class and it becomes a function, inside it becomes a method.

We need support for functions to make implementation reasonably easy: It makes bootstrapping the object model far easier, as well as integration with the outside C-based world.

But Ruby doesn't have functions.

So what to do?

Recently I wrote about hiding the "low level plumbing" and the change I'm about to show you gets us a bit closer to that while at the same time starting to clean up the function vs. method mess.

In Ruby, self outside of a class is the main object - an instance of Object. Logically then, for a def outside of a class, the method will be defined on `Object. So, we'll create that object.

What about function calls?

Defining and calling functions will still be possible, but only using the s-expression syntax. That reasonably cleanly "hides" the plumbing we use functions for (in fact, it means we could stop the annoying convention I started of prefixing the names with "__" since they effectively now live in their own namespace, though I haven't changed that yet).

Some preparations

First we must rewrite a few functions fully as s-expressions. This is straightforward. For example:

    -def __get_string(str)
    -  s = String.new
    -  %s(callm s __set_raw (str))
    +%s(defun __get_string (str) (let (s)
    +  (assign s (callm String new))
    +  (callm s __set_raw (str))
       s
    -end
    +))

Otherwise the upcoming changes will turn them into methods, which is not what we want.

I'm not going to present all of them. The biggest one is probably __new_class_object, but all of these are straight forward translations.

These changes were done on the master branch prior to starting this feature proper.

Splitting up "defun"

You can follow these remaining changes on the main-object branch

So far defun has been made up by two mostly different branches: If occurring inside a class, it's compiled as a method, if not it'd compile a function.

This is both messy and not very logical.

Going forward we'll separate it into defm that defines a Ruby style method, and defun, which, as before, defines a function. defun will be completely hidden from normal Ruby code - you'll have to dip down into the s-expression syntax to access it.

First our list of compile_* methods in compiler.rb:

    -                   :do, :class, :defun, :if, :lambda,
    +                   :do, :class, :defun, :defm, :if, :lambda,

The new, simplified compile_defun:

    # Compiles a function definition.
    # Takes the current scope, in which the function is defined,
    # the name of the function, its arguments as well as the body-expression
    # that holds the actual code for the function's body.
    #
    # Note that compile_defun is now only accessed via s-expressions
    def compile_defun(scope, name, args, body)
      f = Function.new(args, body,scope)
      name = clean_method_name(name)

# add function to the global list of functions defined so far @global_functions[name] = f

# a function is referenced by its name (in assembly this is a label). # wherever we encounter that name, we really need the adress of the label. # so we mark the function with an adress type. return [:addr, clean_method_name(name)] end

This is really more or less just tearing out the method part, and brings it roughly back to how it was before we added the method calling bit.

But compile_defm is also somewhat simpler than the old method part:

    # Compiles a method definition and updates the
    # class vtable.
    def compile_defm(scope, name, args, body)
      scope = scope.class_scope

f = Function.new([:self]+args, body, scope) # "self" is "faked" as an argument to class methods.

@e.comment("method #{name}")

body.depth_first do |exp| exp.each do |n| scope.add_ivar(n) if n.is_a?(Symbol) and n.to_s[0] == ?@ && n.to_s[1] != ?@ end end

cleaned = clean_method_name(name) fname = "__method_#{scope.name}_#{cleaned}" scope.set_vtable_entry(name, fname, f)

# Save to the vtable. v = scope.vtable[name] compile_eval_arg(scope,[:sexp, [:call, :__set_vtable, [:self,v.offset, fname.to_sym]]])

# add the method to the global list of functions defined so far # with its "munged" name. @global_functions[fname] = f

# This is taken from compile_defun - it does not necessarily make sense for defm return [:addr, clean_method_name(fname)] end

The most important changes?

  • The call to scope.class_scope at the top.
  • The change to call __set_vtable near the bottom instead of hardcoding the asm to set the vtable entry.

So, why those changes?

If you look at GlobalScope, we've added a @class_scope instance variable that is initialized to this:

    @class_scope = ClassScope.new(self,"Object",@vtableoffsets)

This means that if you have a :defm occurring in the global scope (we'll get to that) it will be compiled in that class scope, as a method of class Object.

Poof and our functions are (almost) gone.

(In ClassScope we just return self from the new class_scope method.)

For the second change, adding __set_vtable is just the purist in me wanting to expel as much assembler as possible. Here it is (from lib/core/class.rb):

    # FIXME: This only works correctly for the initial
    # class definition. On subsequent re-opens of the class
    # it will fail to correctly propagate vtable changes 
    # downwards in the class hierarchy if the class has
    # since been overloaded.
    %s(defun __set_vtable (vtable off ptr)
      (assign (index vtable off) ptr)
    )

Ok, so not just the purist in me. In fact, this makes re-opening classes "cleaner" in that this function will actually get reasonably complex as we fix this issue, and also later handle define_method and otherwise deal with cases where no vtable slot is actually available.

Anything else? A few pieces of cleanups to deal with the scope changes, and this little gem / turd depending on your point of view, in lib/core/core.rb:

    +
    +# OK, so perhaps this is a bit ugly...
    +self = Object.new
    +

Ehm. We probably should put that in the compiler, or at least prevent arbitrary assigning to self in user code. But it works for now, and we've done worse in the past. To reiterate: Get things working first. Make them fast/pretty later.

The last vital change is this, in Parser#parse_def:

    -    return E[pos, :defun, name, args, exps]
    +    return E[pos, :defm, name, args, exps]
       end

It does what it looks like - make the parser for the Ruby code spit out only :defm nodes instead of :defun nodes. Our code is now free of functions anywhere outside of s-expressions.

What's the effect?

These changes make this work as expectd:

    class Foo
      def hello
        puts "hello"
      end
    end

def hello_world puts "hello world!" end

%s(puts "foo") # Calls the C function Foo.new.hello # Calls Foo#hello which calls Object#puts puts "world" # Calls Object#puts (via the "main" instance)

hello_world # Calls Object#hello_world (via the "main" instance)

Adding this on the end on the other hand still fails (but it works in MRI) because __set_vtable doesn't propagate the vtable update downwards to subclasses of Object:

    Foo.new.hello_world

When re-opening a class, new methods needs to be added "properly" by processing all child classes and updating their vtables.

The alternative (which moves the cost from the point of defining a method to the point of calling the method), is to implement a proper default method missing that ascends the class hierarchy to look for an implementation before giving an error.

We really should do both, since we eventually need to handle methods without vtable entries.

But that's for a future installment



2009-12-21 16:45 UTC How to implement closures

This is a sort-of interlude to my regular compiler series.

The goal is to give a brief overview of some techniques for implementing closures in a programming language. I will use C for my examples, mostly because it's low level enough that a further translation to assembler etc. is straight forward, and many compilers target C directly anyway.

Before we start, most of the code examples in this post are available in this Gist so you don't need to cut and paste bits and pieces (which won't work anyway, as the text below will omit details such as header includes)

A closure) is in it's simplest form a block of code that can be passed around as a value, and that can reference variables in the scope it was created in even after exiting from that scope.

(For a more formal description look at the Wikipedia page linked above)

A Ruby block forms a closure, for example:

    def foo x
      printf "x is %d\n",x

lambda do x += 1 printf "block: x is %d\n",x

end end

c = foo(5) c.call

c.call

Every time foo is called it will return a closure that has access to the x that was passed in to foo. In the example, x will start out at 5 and get incremented to 7.

The most important implication of this is that any variables used in the closure must be guaranteed to live as long as the closure does.

We'll get back to this example throughout this article.

There are a number of ways of handling this, for example:

  • Instead of a traditional stack, put activation frames (arguments and local variables) for function/method calls on the heap, as a linked list. When creating a closure, you just keep a reference to it. When returning from a function you unlink the frame from the ones below. Presto, the gc handles the remaining dirty details. Some variations of this is called a spaghetti stack
  • Rewriting. You can rewrite any function that may create a closure to create a separate heap allocated environment, and to copy/locate all arguments and variables that may be reused into it. The environment can safely be returned with the closure.
  • Copy. You can copy the current activation frame into a separate closure environment when returning the closure, ensure the closure refers to the variables via a reference, that is replaced with a references to the environment copy.
  • More? Probably...

Each of these have advantages and disadvantages.

We're going to look at the rewriting method mainly, though most of what you find below also apply to the copying method.

These methods are pretty similar - they both involve returning a pointer to the function implementing the code in the closure combined with a pointer to the data in the closure. The difference lies in how that data is accessed.

The rewriting approach creates the closure environment as early as possible, and changes the surrounding function to refer to that environment. It incures the closure creation cost whenever the closure generating function is created, but since local variables are initialized straight into the environment it avoids later copying.

The copy approach delays the creation of the closure environment as much as possible, and then copies the data. It can avoid unnecessary creation cost if there are paths that don't lead to creation of a closure, but when the creation happens, it needs to handle full copying of any objects or object references involved, which may be more expensive.

For both of these methods, care must be taken to create a single environment for any set of closures created during the same execution of the function that creates the closures.

"Fat pointers", objects and thunks

When returning a closure we also need to be able to pass along the environment.

As it turns out, there are a number of ways of handling this:

  • We can create a "fat pointer". I.e. instead of passing around only the address to the code, we also pass around a pointer to the environment, and it is the callers responsibility to load that address onto the stack or into a register so the code can get at it.
  • We can turn the closure into an object, like Ruby's Proc, and simply treat the variables used as instance variables of that object.
  • We can create our own half-assed almost object by storing the function pointer in the environment.
  • We can create a "thunk" on the fly - a small piece of code that will load the address of the environment and jump straight into the real code - and return that instead of a pointer to the real code of the closure. The major downside here is that it requires turning off protection against executing code from the heap.

The "fat pointer" approach is simple but involves more work at every call site, and doubles the size of the data that is passed around.

Turning it into an object is simple, and for my Ruby compiler it's even necessary a lot of the time since most of the time when you handle blocks in Ruby, you'll actually get a Proc object. But it has the full overhead of method dispatch.

The "half-assed object" approach still requires each call site to do a little bit more work, but less than the fat pointer approach. It also doesn't require additional data to be passed around (the function pointer is stored in the environment instead of copied around)

Creating a "thunk" also has overhead, but isn't as scary as it may sound - the code to create is very simple, and really it consists of copying a few bytes around.

We'll start with a "sort of" object, and then take a look at the thunk approach.

Simulating the rewriting method in C

closures-basic.c

C lacks pretty much everything that could make this convenient and easy, so it really lays the implementation bare, for better or worse.

First, let's create a structure to hold the function pointer and environment:


struct closure { void (* call)(struct closure *); int x; };

If we had more local variables in foo, we'd add them to this structure.

Then we need to create a function with the code for the lambda block:


    void block(struct closure * env) {
      env->x += 1;
      printf ("block: x is %d\n", env->x);
    }

Finally we can implement foo:


    struct closure * foo(int x)
    {
      struct closure * closure = (struct closure *)malloc(sizeof(struct closure *));
      closure->x = x;
      printf ("x is %d\n",closure->x);
      closure->call = █
      return closure;
    }

Ewww..

A couple of observations: If we want to be able to return multiple closures (say, an array of them), the variables needs to be acessed via one more indirection, which makes this even more disgustingly convoluted.

It's also annoying, because it means lots of extra overhead in order to handle a situation that might very well never arise.

If the language requires supporting multiple closures (like Ruby), a compiler could support both approaches to optimize - adding the cost of the extra redirection only:

  • when more than one closure can potentially be returned at the same time.
  • when those closures need access to the same variables.

For an example of these restrictions, consider:

     def foo x
       a = 1

b = 2 if x return lambda { a += 1 }

else return lambda { b+= 1 }

end end

First of all, this function is guaranteed to only return one lambda at the time, so applying that rule, we can just assing the appropriate function pointer to the call member variable, and do away with the extra indirection.

Secondly, even if we decide to return both of them at the same time, they are guaranteed to never access the same variables, so we could instead create two distinct closure environments, and still avoid the extra indirection.

But let's take a look at the complete example with the extra indirection anyway, as a worst case scenario:

Indirection hell

closures-indirection.c

First we create an environment for the free variables we wish to "capture":


struct env { int x; };

Our modified closure structure holds a pointer to it, instead of holding the variables:


    struct closure {
      void (* call)(struct env *);
      struct env * env;
    };

The block takes a pointer to the environment:


    void block(struct env * env) {
      env->x += 1;
      printf ("block: x is %d\n", env->x);
    }

And the closure function itself needs to first allocate the environment, and then the closure:


    struct closure * foo(int x) {
      struct env * env = (struct env *)malloc(sizeof(struct env));
      env->x = x;

printf ("x is %d\n",env->x);

struct closure * closure = (struct closure *)malloc(sizeof(struct closure *)); closure->env = env; closure->call = block;

return closure; }

Finally we call it with the env:


    int main() {
      struct closure * c = foo(5);

c->call(c->env); c->call(c->env); }

Fat pointers

The example above is actually really simple to use to illustrate the fat pointer approach. Instead of returning a pointer to the closure object, we simply return the object itself:

closures-fatptr.c


    struct closure foo(int x)
    {
      struct env * env = (struct env *)malloc(sizeof(struct env));
      env->x = x;

printf ("x is %d\n",env->x);

struct closure closure; closure.env = env; closure.call = block;

return closure; }

int main() { struct closure c = foo(5);

c.call(c.env); c.call(c.env); }

As you can see it makes some things simpler (no need for the second memory allocation step).

Variable substitution

An optimization worth keeping in mind is variable substitution. In cases where it can be guaranteed that the variables in question keeps the same value, the need for a separate environment may go away if variables are substituted for their values in the closures body.

Furthermore, if the variables can be guaranteed not to change in any closure returned from the function, then whether or not the variable changes in the function outside the closures, the closures may keep separate environments (and hence do the optimization to avoid indirection) with copies of the variables in question.

Of course this would require the function to carry out any updates once for each generated closure.

(you might have spotted here one of the advantages for functional languages with little or no mutation of variables - they have a lot fewer issues to worry about with respect to sharing of viarable state)

Moving on to thunks

closures-thunks.c

Now that we have the basics down, how do we return just a "plain" function pointer, so we can simplify the call sites?

What we want to create is something like the code below.

Note that we take a shortcut and don't create a proper stack frame - this kind of thing can easily confuse gdb and other debuggers, which is not necessarily very nice.

The thunk below also does not directly allow passing any arguments to the block - if we wanted to do that it gets hairier, since we'd need to manipulate the parameters passed, while the caller doesn't know we've altered the size of the parameter space set aside. In a compiler this would likely be solved by making either the caller or callee aware that it's a closure call, and adjusting the stack accordingly separate from the thunk.

Of course, the downside of using asm here is that it's architecture specific, and the code to generate the thunk will need to be modified accordingly to port the code, but then if you do this in a compiler that is outputting native assembly, you have to do that anyway.

    my_closure_instance:
        pushl $my_environment ; Push the environment onto the stack as the first arg
        call  $the_block      ; Go to the real code
        addl  $4,%esp         ; Throw the environment pointer away
        ret                   ; Return to the caller

Putting some arbitrary values in there, and assembling it with gcc/gas, and then doing objdump -D to the resulting binary gives this (and other bits and pieces I've cut - use the -nostdlib option to avoid dealing with a bunch of initialization code):

    08048055 :
    8048055:       68 00 08 af 2f          push   $0x2faf0800
    804805a:       e8 00 00 00 00          call   804805f 
    804805f:       83 c4 04                add    $0x4,%esp
    8048062:       c3                      ret  

This shows us the values to put in our thunk. Couple of observations: The push is just followed by the address itself, but the call uses an offset from the first byte of the following instruction.

This approach of using structs to generate the thunk, I've blatantly stolen from Joe Damato

because I like how it makes the code that manipulates the thunk more readable:


    struct __attribute__((packed)) thunk {
      unsigned char push_op;
      void * env_addr;
      unsigned char call_op;
      signed long call_offset;
      unsigned char add_esp_ops[3];
      unsigned char ret_op;
    };

struct thunk default_thunk = {0x68, 0, 0xe8, 0, {0x83, 0xc4, 0x04}, 0xc3};

(the __attribute__ stuff is the gcc specific way of avoiding padding for alignement)

We change the rest like this:


    typedef void (* cfunc)();

cfunc foo (int x) { struct env * env = (struct env *)malloc(sizeof(struct env)); env->x = x;

printf ("x is %d\n",env->x);

struct thunk * thunk = (struct thunk *)mmap(0,sizeof(struct thunk), PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); *thunk = default_thunk; thunk->env_addr = env; thunk->call_offset = (void *)&block - (void *)&thunk->add_esp[0]; // Pretty! mprotect(thunk,sizeof(struct thunk), PROT_EXEC); return (cfunc)thunk; }

The typedef is a workaround for C's absolutely atrocious ptr-to-function declarations...

The interesting bit is at the end where we allocate the thunk, and fill in the addresses

Then we cast the thunk data structure to a function pointer. That ought to make you feel dirty, and a bit queasy. It's ok, though.

We use mmap to avoid problems on systems with executable heaps turned off (execshield etc. that wil cause a segmentation fault if you try to execute code in malloc()'d memory or on the stack), and the mprotect() turns off write access to the page after we're done. For a production approach you may want a dedicated allocation function to properly manage this and perhaps avoid doing a separate mmap for every thunk created.

All of this could really be wrapped up into a nice generic function. Something like this:


    struct thunk * make_thunk(struct env * env, void * code) {
      struct thunk * thunk = (struct thunk *)mmap(0,sizeof(struct thunk), PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      *thunk = default_thunk;
      thunk->env_addr = env;
      thunk->call_offset = code - (void *)&thunk->add_esp[0]; // Pretty!
      mprotect(thunk,sizeof(struct thunk), PROT_EXEC);
      return thunk;
    }

Finally, this makes our main function look like this:


    int main() {
      cfunc c = foo(5);

c(); c(); }

Now that's nicer...

Since this is already gcc/Linux/x86-32 specific, it can be made even nicer. gcc supports inner functions, and with a tiny bit of restructuring and a couple of macros I did this, mostly for fun to see how close to a "natural" syntax for closures I could get in C without changing the compiler:


    #define initenv(__vars__) struct env { __vars__ ; } * env = (struct env *)malloc(sizeof(struct env));
    #define new_closure(__block__) (closure)make_thunk(env,&__block__)

closure foo (int x) { initenv(int x) env->x = x;

printf ("x is %d\n",env->x);

void block (struct env * env) { env->x += 1; printf ("block: x is %d\n", env->x); }

return new_closure(block); }

I'll still stick to Ruby...

Some parting comments

I wrote this while exploring various approaches to add closure support to my Ruby compiler. I haven't quite made up my mind yet. The object approach is tempting because Ruby already has the Proc class, but I'll probably go for an environment + fat pointer approach that will be converted into a Proc object if assigned to anything (as opposed to just used for yield)

The thunk approach is somewhat appealing too, but if turned into a Proc object it may be as much or more overhead than the fat pointer approach (and there will only be one call site: In Proc#call) and this approach would mean the fat pointer wouldn't be passed around much - typical usage would be passing a block to a method, and then yield'ing to it, where the yield would be the only call site.



2009-12-16 18:26 UTC Writing a (Ruby) compiler in Ruby bottom up - step 23

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Continuing down the rabbit hole: String

A couple of parts ago we established some of the problems with supporting even the seemingly simple attr_reader, attr_writer and attr_accessor. To reiterate, a naive implementation looks like this:

def attr_accessor sym
  attr_reader sym
  attr_writer sym
end

def attr_reader sym define_method sym do %s(ivar self sym) end end

def attr_writer sym define_method "#{sym.to_s}=".to_sym do |val| %s(assign (ivar self sym) val) end end

We resolved the issue of having a working sym.to_s, but that still leaves a number of things to address. For example, we need at least the start of a "proper" String class that we can add #to_sym to. That's what we'll look at this time.

The first bit is very similar to what we did for Symbol: We won't type tag, and we'll create String objects for literal strings by calling a low level function.

Quick sidebar: Hiding the low level implementation

MRI "hides" the gory details of String and other basic classes by not making them real Ruby objects per-se. There's data there that is simply not accessible without dipping into the C API. We may eventually want to take a similar tack - add a mechanism for allocating space for, setting and retrieving low level variables via the s-expression syntax that are completely invisible for the higher level API.

Currently we don't. The "raw" string used for Symbol and for String objects is a normal instance variable, and it's perfectly possible to expose it, and accessing it thinking it's a real object will trivially easily cause nasty crashes.

In general, we would eventually benefit from creating a mechanism to "hide" the low level plumbing from accidental access by normal Ruby. For now we'll retain the ability to shoot ourselves in the foot easily.

The basics

The String class will for now contain just a pointer to a C-style string. It will be immutable in it's first incarnation. This gets us a decent step further very easily - we can just keep the address to a C-style string constant.

As it turns out, this will be an easy foundation to build on:

MRI uses a number of flags for strings, for example to do "copy on write". Once we need to make strings mutable, we'll do the same. Strings created from constants will start out marked as "copy on write", and when doing an update, the string buffer will be copied into freshly allocated space first. The hope is that many string constants will never be updated.

We also need to implement String#to_sym. This is thankfully also reasonably easy: We need to access the low level function we added that will retrieve a symbol based on a "raw" c-style string.

The code

As the last couple of times, this was unfortunately committed a bit piecemeal, so the commits aren't easy to follow. But lets go through this step by step.

First of all, we're going to rewrite the tree, to alter a literal string into

%s(call __get_string ConstantReferringToTheRawString)

So, we alter compile to add a call to rewrite_strconst:

#
# Re-write string constants outside %s() t
# %s(call __get_string [original string constant])
def rewrite_strconst(exp)
  exp.depth_first do |e|
    next :skip if e[0] == :sexp
    is_call = e[0] == :call
    e.each_with_index do |s,i|
      if s.is_a?(String)
        lab = @string_constants[s]
        if !lab
          lab = @e.get_local
          @string_constants[s] = lab
        end
        e[i] = [:sexp, [:call, :__get_string, lab.to_sym]]
        # FIXME: This is a horrible workaround to deal
        # with a parser inconsistency that leaves calls
        # with a single argument with the argument "bare"
        # if it's not an array, which breaks with this rewrite.
        e[i] = [e[i]] if is_call && i > 1 
      end
    end
  end
end

Effectively this visits all nodes that are not s-expressions, and if it finds a string, it will add a string constant. It will then use the label that is allocated to rewrite the expression like this:

[:sexp, [:call, :__get_string, lab.to_sym]]

As the following line says, it's a nasty workaround, and it will be torn out as soon as I get a parser, so just ignore that... pretend it's not even there ;)

Now, the reason we wrap the call to __get_string is that otherwise it will be rewritten to a method call because there really is no such thing as function calls in Ruby, but for some of our low-level plumbing we need actual C-style function calls (such as for calling out to C code).

We're also changing strconst, the method used to get the "value" of a string constant as follows:

-  def strconst str
-    lab = @string_constants[str]
-    return lab if lab
-    lab = @e.get_local
-    @string_constants[str] = lab
-    return lab
+  def strconst(a)
+    lab = @string_constants[a]
+    if !lab # For any constants in s-expressions
+      lab = @e.get_local
+      @string_constants[a] = lab
+    end
+    return [:addr,lab]
   end

What's left? Not much, really. You can see the basi c String class

Here's the important bit:

class String
   def __set_raw(str)
     @buffer = str
   end
end

def __get_string(str) s = String.new %s(callm s __set_raw (str)) s end

It's worthwhile reading the comments in at the link above if you want to see more about where this is heading.

We'll flesh out the String class in coming parts, but this groundwork now means we're working on a "real" object instead of on a raw pointer to a c-style string, which means it's mostly a runtime library issue instead of requiring much in terms of compiler changes.

One of the things worth mentioning that is covered in one of the comments in lib/core/string.rb is that the whole __get_string and __get_symbol thing is a bit ugly. As I mentioned earlier in this part, hiding the low level implementation would be nice, and that includes preventing direct calls to __get_string and _get_symbol, and that can be done trivially by simply ensuring the Ruby code can't directly do function calls (as opposed to method calls). But that doesn't solve String#__set_raw. To fix that, I'll need to either remove the need for the method, or by making it possible to "hide" certain methods entirely from the Ruby code...

It's not a priority now, though - first we get things working, then we make it pretty.

What about String#to_sym? Well, this ought to do the trick:

class String
  def to_sym
    __get_symbol(@buffer)
  end
end

So, that leaves us with a working define_method and %s(lambda) with support for variables to get attr_reader,attr_writer and attr_accessor working... We're heading down that road soon.



2009-12-10 18:53 UTC Ruby gets a spec

A week or so ago, the Ruby Draft specification made the rounds

Yes... Ruby is finally getting a standard. While RubySpec has been around for a while, and is a great, it is an executable specification that tells you what, not why, and it aims to be "complete" while the new project aims to define a shared core suitable for implementation by all the different Ruby implementations that are springing up.

I think they're complementary enough that the Ruby Draft Specification could be very beneficial.

For my compiler project this means there's now a compelling reason to revisit some things (such as the parser) to look at conformance, and a clear roadmap for some others (classes) to ensure it at least meets the spec.

Of course, in many ways MRI will still be the benchmark, but the standard will provide a minimal level of conformance that will likely be easier to achieve.

In particular, the spec includes a formal grammar for Ruby that doesn't involve walking through thousands of lines of code written for another implementation to verify that you're doing things right. I'm particularly looking forward to going through my parser and aligning it more closely with the draft spec... (I'm sure I'll be swearing a lot while doing it, though)

It also includes descriptions of expected semantics for things like the object model that have previously been something a lot of people (including me) have spent time revers engineering to figure out.

Combined with all the Ruby implementations in progress, this is a clear indication that Ruby is growing up and becoming a contender in areas where it would previously not be acceptable.

Keep an eye on the Ruby standard project...



Older Entries

  • 2009-11-10 Writing a (Ruby) compiler in Ruby bottom up - step 22
  • 2009-11-01 Writing a (Ruby) compiler in Ruby bottom up - step 21
  • 2009-05-05 Writing a (Ruby) compiler in Ruby bottom up - step 20
  • 2009-04-19 The problem with compiling Ruby
  • 2009-04-16 Writing a compiler in Ruby bottom up - Milestone: It can parse itself...
  • 2009-03-10 Writing a compiler in Ruby bottom up - step 19
  • 2009-02-23 Writing a compiler in Ruby bottom up - step 18
  • 2009-02-21 Writing a compiler in Ruby bottom up - step 17
  • 2009-02-17 Writing a compiler in Ruby bottom up - step 16
  • 2009-02-14 Writing a compiler in Ruby bottom up - step 15
  • 2009-02-12 Writing a compiler in Ruby bottom up - step 14
  • 2009-02-01 Just added a github repository for my compiler series
  • 2009-01-26 Writing a compiler in Ruby bottom up - step 13
  • 2008-12-08 A simple Operator Precedence parser
  • 2008-10-27 Writing a compiler in Ruby bottom up - step 12
  • 2008-09-29 Writing a compiler in Ruby bottom up - step 11
  • 2008-07-10 Writing a compiler in Ruby bottom up - step 10
  • 2008-06-19 Writing a compiler in Ruby bottom up - step 9
  • 2008-06-01 Writing a compiler in Ruby bottom up - step 8
  • 2008-05-16 Writing a compiler in Ruby bottom up - step 7
  • 2008-05-06 Unholy: Converting Ruby 1.9 bytecode to Python bytecode
  • 2008-05-03 Writing a compiler in Ruby bottom up - step 6
  • 2008-04-27 Writing a compiler in Ruby bottom up - step 5
  • 2008-04-17 Writing a compiler in Ruby bottom up - step 4
  • 2008-04-08 Strawman for a new parser generator
  • 2008-04-06 Ur-Scheme: A tiny self-hosting Scheme to x86 asm compiler
  • 2008-04-05 - Why am I forced to optimize when choosing my language?
  • 2008-04-05 Writing a compiler in Ruby bottom up - step 3
  • 2008-03-27 Writing a compiler in Ruby bottom up - step 2/??
  • 2008-03-26 Writing a compiler in Ruby bottom up - step 1/??
  • 2008-03-21 Shotgun: The Rubinius virtual machine and some musings on compiling Ruby
  • About me

    E-mail: vidar@hokstad.com Skype: vhokstad
    Twitter: vhokstad
    View my LinkedIn profile.

    I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married and we just had our first child, Tristan Ikemefuna Hokstad.

    I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

    Twitter Updates

      follow me on Twitter