Tag: compiler

2009-05-05 19:16 UTC Writing a (Ruby) compiler in Ruby bottom up - step 20


You may or may not have seen my recent post where I admitted to more or less having decided to make my compiler project a Ruby compiler. On the downside this means a lot of complexity that may make it harder to follow. On the upside... Well, you get to read about a compiler for a "real" language as opposed to a toy, and hopefully the result will be usable and I'll manage to contain myself and not go off on wild tangents (I have a long list of experiments I want to do)

Since the last part things have moved quite far, and I'm not going to do a "patch by patch" overview of what's happened since. Instead I will this time give a rough overview of the state of the compiler as of starting to write this - April 25th (slightly derailed by having a child). Parts of the compiler with more extensive changes will get more "coverage".

Let me again point out that all of the code is on Github. Specifically the state of the code when writing this can be found at tag "step-20c".

Furthermore, I'd like to extend an invitation to anyone who's interested in contributing. Go check out the project from Github, fork it if you like, and send me pull requests when you have something you'd like me to consider for the main tree, or just let me know if it's ok for me to pull things in whenever I see something interesting. There's now two of us - Christopher Bertels started contributing a while back, and have added lots of rdoc documentation and a range of other improvements, such as part of the instance variable support.

We've so far decided to go with the same license as Matz' version of Ruby, but I'm open to discussion about that. Specifically I would prefer the core library to be licensed under terms sufficiently loose to allow closed source apps to be compiled as well.

Anyway, lets see what things look like now.

The Parser

If you've followed my blog, you'll probably have seen the announcement that the compiler can now parse itself. I'm sure it still has bugs (it certainly did when I posted the announcement, as I've found out since), and it most certainly can't compile the AST it generates, but it's still a fairly big milestone. It also means that it can parse a small but relatively central subset of Ruby (there are many, many gaps).

The parser is split in three components:

  • The low level s-expression style parser. This part has hardly changed.
  • The operator precedence parser used for expressions. This has changed a lot, and needs a cleanup, but I'll attempt a description of the current state
  • The "high level" parser used for control structures etc.. This is a simple recursive descent parser. It also would benefit from a cleanup, but it's not hard to understand (I think).

First of all, let us take a look at something else: Test cases. Parsing Ruby is nasty. As much as I love to program in Ruby, the grammar makes me feel downright dirty. It is full of context sensitive parts, exceptions and weird rules that make it painful to write a small and readable parser for it. All the more reason to write extensive sets of test cases.

Testing the parser components

I've been using Cucumber to add test cases, mainly because so far for the parser the test cases are very repetitive "here's the input, it's meant to generate this tree" kind of tests, and Cucumber makes that look nice and readable:
Feature: Shunting Yard
    In order to parse expressions, the compiler uses a parser component that uses the shunting yard
    algorithm to parse expressions based on a table.

Scenario Outline: Basic expressions Given the expression expr When I parse it with the shunting yard parser Then the parse tree should become tree

Examples: | expr | tree | | "__FILE__" | :__FILE__ | | "1 + 2" | [:add,1,2] | | "1 - 2" | [:sub,1,2] | | "1 + 2 * 3" | [:add,1,[:mul,2,3]] | | "1 * 2 + 3" | [:add,[:mul,1,2],3] | | "(1+2)*3" | [:mul,[:add,1,2],3] | | "1 , 2" | [:comma,1,2] | | "a << b" | [:shiftleft,:a,:b] | | "1 .. 2" | [:range,1,2] | | "a = 1 or foo + bar" | [:or,[:assign,:a,1],[:add,:foo,:bar]]| | "foo and !bar" | [:and,:foo,[:not,:bar]] | | "return 1" | [:return,1] | | "return" | [:return] | | "5" | 5 | | "?A" | 65 | | "foo +\nbar" | [:add,:foo,:bar] | | ":sym" | :":sym" | | ":[]" | :":[]" |

As of writing, the parser components have 106 scenarios (each entry in the example tables counts as one scenario) including a few failing ones. Whenever we find anything broken, a new test case goes in. 106 scenarios is nowhere near complete, but it helps tremendously with debugging, particularly with the operator precedence parser, as it's been fairly tedious to adjust and adapt it to handle the peculiarities of Ruby. It also serves as documentation of sorts of what the parser is expected to deliver

The operator precedence parser

We've covered this component before of sorts (go revisit it if you're unsure about it), but it's grown significantly in complexity since, mostly to account for weirdness in the Ruby grammar, some to account for missing functionality.  A word of warning before we start: It's gotten messy, and it needs to be cleaned up and re-factored. But it's not that hard to figure out.

I'm going to assume you're familiar with the previous parts, and have a rough idea of Dijkstra's Shunting Yard algorithm which this parser component is based on. We'll examine it top-down the way it's being used:

  def self.parser(scanner, parser)
     ShuntingYard.new(TreeOutput.new,Tokens::Tokenizer.new(scanner), parser)
  end
We start by instatiating the parser and passing  the most frequently used components to it. By passing an alternative to TreeOutput you an turn the output into reverse polish notation or anything you like. But we want a parse tree. Tokens::Tokenizer is used to retrieve a stream of tokens instead of working on individual characters. Last but not least we're passing it a reference to the recursive descent parser. Yuck. We do this because of Ruby's block syntax, which because of the way blocks can look like literal hashes and/or can be chained and used as part of an expression means we need to be able to call bak out of the shunting yard parser. 
It might be that it'd be just as well to combine the two, but I don't like tight coupling and I'm still living in hope of getting the time to figure out a way to reduce the coupling between these two components, and make the shunting yard parser generic enough to be reusable. One of the promises of an operator precedence parser is to be configured with a simple table of operators, and it's possible more of the exceptions in this current model can be handled by adding a few flags.

On to the next bit:
    def parse(inhibit=[])
      out = @out.dup
      out.reset
      tmp = self.class.new(out, @tokenizer,@parser)
      res = tmp.shunt(@tokenizer,[],inhibit)
      res ? res.result : nil
    end
A couple of oddities here. I've introduced an argument to allow specifying a set of tokens to inhibit and terminate the expression. This is needed because of peculiarities in how Ruby handle commas. Some places it's ok as it separates function arguments or array elements, but you also need to be able to put expressions  IN argument lists as default assignments, and there commas are not allowed without parantheses, as they're needed to separate arguments, and the argument list itself is parsed by the recursive descent parser. It would be possible to change this, but I'm not sure I want to try to make the argument list be parsed by the same component until I'm more sure of the consequences. Besides the current arrangement works.
You can also see we do some annoying initialization - copying and resetting the tree output object. This is done because the parser recursively calls itself, and the state of the output tree builder need to be clean when it gets called for a subtree in parantheses for example.

The next bit is in a fairly shameful state at the moment, but rather than wait until it's cleaned up I'd rather show you the intermediate state, and we'll revisit again later once I have it all figured out - the shunting parser probably deserve it's own follow up article as it can get a bit hard to follow.


Leading up to the main loop, which I'll break up in several chunks:

    def shunt(src, ostack = [], inhibit = [])
      possible_func = false     # was the last token a possible function name?
      opstate = :prefix         # IF we get a single arity operator right now, it is a prefix operator
                                # "opstate" is used to handle things like pre-increment and post-increment that
                                # share the same token.
      lp_on_entry = ostack.first && ostack.first.type == :lp
      opcall  = Operators["#call#"]
      opcallm = Operators["."]
      lastlp = true

The commented local vars are hopefully reasonably understandable. "lp_on_entry" is set to true if the method is passed an :lp operator in - that means '(','[', '{' etc.. We'll see how that's used later.

opcall and opcallm are just convenience shortcuts, though the hardcoded references to Operators[] are ugly (we'll meet more of them, and they'll go as soon as I find a clean way of decoupling the logic in this method).

"lastlp" is set to indicate whether or not the last token was an :lp type operator.

      src.each do |token,op|
        if inhibit.include?(token)
          src.unget(token)
          break
        end

Ok, so this is the start of the main loop. We get the token as a string in in "token", and if it's an operator we get the operator object in "op". We check if the token is member of a set of "inhibited" tokens that we don't allow to be part of an expression. This is used to handle cases where the recursive descent parser wants an expression that stops on encountering something that is normally a legal part of the operator. We'll see it used when we look at the recursive descent parser.

        if op
          op = op[opstate] if op.is_a?(Hash)

We then start handling operator tokens - most of the loop is split between handling operators and handling non-operator tokens, with very little shared logic. In some cases there will be two operators that share the same token. Currently we handle :infix_or_postfix vs. :prefix operators, and the only case it's currently used for is "*" as multiplication operator vs. as the "splat" operator (which expands arrays).

          # This makes me feel dirty, but it reflects the grammar:
          # - Inside a literal hash, or function call arguments "," outside of any type of parentheses binds looser than a function call,
          #   while outside of it, it binds tighter... Yay for context sensitive precedence rules.
          # This whole module needs a cleanup
          op = Operators["#,#"] if op == Operators[","] and lp_on_entry

The comment above speaks for itself, no? And we see a use for lp_on_entry to help decide the precedence rule to use for the comma, by switching objects around (Note that this also shows an anti-pattern that needs to be fixed: When you have hashes, try to avoid string keys. Strings are expensive objects where symbols get mapped to an integer - for things like this using symbols as keys tends to be more efficient).

          if op.sym == :hash_or_block || op.sym == :block
            if possible_func || ostack.last == opcall || ostack.last == opcallm
              @out.value([]) if ostack.last != opcall
              @out.value(parse_block(token))
              @out.oper(Operators["#flatten#"])
              ostack << opcall if ostack.last != opcall
            elsif op.sym == :hash_or_block
              op = Operators["#hash#"]
              shunt(src, [op])
              opstate = :infix_or_postfix
            else
              raise "Block not allowed here"
            end

The whole block above gets executed if the current operator is a '{' (:hash_or_block) or :block ("do"), and it's one of the hairy bits... If we've seen what is possibly the start of a block (possibly, because '{' could also start a hash, hence the symbol), we first check if we're possibly looking at a function call (a pre-requisite for a block) based on the previous token, OR if the last operator to have been pushed on the operator stack is the function or method call operators (note: We currently still differentiate between function and method calls, though that is not a distinction Ruby makes; it is helpful for some aspects of the parser, and it is helpful for the low level s-expression syntax, but this distinction will normally become hidden from the "end-user"). 

If there's a function call being handled, then we know we're dealing with a block. We output an empty argument array if needed, and then call back up into the recursive descent parser via "parse_block" (token is passed so that the parser can determine what symbol the block should end on - '}' vs. 'end'). The "fake" operator "#flatten#" is output to aid in the tree building as the tree builder doesn't directly handle cases where a node has more than two children/arguments. This is a bit of a hack, and there is actually no real reason why a shunting yard parser can't be adapted to directly handle operators that are prefixes to multiple operands, so that may be part of the cleanup later. 

If we're NOT dealing with a function call, we check if we're looking at the :hash_or_block ('{') operator, and if so we know we're now dealing with a Hash, and we recursively call shunt to handle the interior of the Hash.

          else
            if op.type == :rp
              @out.value(nil) if lastlp
              @out.value(nil) if src.lasttoken and src.lasttoken[1] == Operators[","]
              src.unget(token) if !lp_on_entry
            end

If we are not dealing with a potential block, we move on. First, above, we check for a :rp (right parenthesis, bracket or brace) operator. We "fake" a value if we've just seen an empty pair of parentheses, so we don't have to deal with operand-less operators elsewhere. We also fake value if the last token was a comma operator. This is to handle the convenience case in Ruby where arrays or hashes can have a "dangling" comma at the end like so: [1,2,3,] (useful for symmetry and machine generating Ruby source.

            reduce(ostack, op)

We call reduce in order to fore output of any tighter binding operators. In the case of, say, 1 * 2 + 3, (* 1 2) will get output when encountering '+'.

            if op.type == :lp
              shunt(src, [op])
              opstate = :infix_or_postfix
              # Handling function calls and a[1] vs [1]
              ostack << (op.sym == :array ? Operators["#index#"] : opcall) if possible_func

If it's an lp type operator, we do as we did for Hash ('{'), except we also want to selectively output "#index#" or the "#call#" operators if we're dealing with a function call. This occurs when we have an expression like "foo(1)", in which case "possible_func" will be true after "foo" has been encountered, and we then parse "(1)" and pass the result onto the output handler, and then output the "#call#" operator to tie "foo" and it's arguments together. As the slightly misleading comment says, "#index#" is used if we see '[...]' after something that might indicate a function call, as we need to differentiate a[1] (a call to the method "[]" on the object reference held in "a") vs [1] (constructing an Array object).

            elsif op.type == :rp
              break

If we dealt with an :rp we want to exit from the loop.

            else
              opstate = :prefix
              ostack << op
            end
          end

... if not we just push the operator on the operator stack.

        else
          if possible_func
            reduce(ostack)
            ostack << opcall
          end
          @out.value(token)
          opstate = :infix_or_postfix # After a non-operator value, any single arity operator would be either postfix,
                                      # so when seeing the next operator we will assume it is either infix or postfix.
        end
        possible_func = op ? op.type == :lp :  !token.is_a?(Numeric)
        lastlp = false
        src.ws if lp_on_entry
      end
Ok, time to handle non-operator values. Not much to say about this - it's hopefully reasonably understandable. The last line about will skip whitespace including line feeds if we're inside a parenthesized sub-expression, while normally (inside the tokenizer) we only skip whitespace excluding linefeeds. This is because the general rule in Ruby is to allow linefeeds anywhere where it doesn't create an ambiguity. I'm sure there are more special cases we'll need to handle, but for now this approximates the rule closely enough.
      if opstate == :prefix && ostack.size && ostack.last && ostack.last.type == :prefix
        # This is an error unless the top of the @ostack has minarity == 0,
        # which means it's ok for it to be provided with no argument
        if ostack.last.minarity == 0
          @out.value(nil)
        else
          raise "Missing value for prefix operator #{ostack[-1].sym.to_s}"
        end
      end

reduce(ostack) return @out if ostack.empty? raise "Syntax error. #{ostack.inspect}" end

This is the last part we'll look at (for the "reduce" method, see the earlier post), and it's outside the look. It's down to handling errors and optional operands, and then reducing the operator stack completely before returning the output handler (which should at this point hopefully contain a complete expression).

The recursive descent parser

The operator precedence parser was the hard part of parsing. There's not really all that much to say about the changes to the recursive descent parser - it's all really formulaic, so I'll pick one of the larger functions and walk through that:

  def parse_block(start = nil)
    pos = position
    return nil if start == nil and !(start = expect("{")  || expect("do"))
    close = (start.to_s == "{") ? "}" : "end"
    ws
    args = []
    if expect("|")
       ws
      begin
        ws
        if name = parse_name
          args << name
          ws
        end
      end while name and expect(",")
      ws
      expect("|")
    end
    exps = parse_block_exps
    ws
    expect(close) or expected("'#{close.to_s}' for '#{start.to_s}'-block")
    return E[pos,:block] if args.size == 0 and !exps[1] || exps[1].size == 0
    E[pos,:block, args, exps[1]]
  end

This is what the shunting yard parser calls back into. Most of this should be fairly understandable assuming you remember that "expect" tries to match a string, and "ws" skips past whitespace. 
The major new thing here is "position" and that "E[]" stuff. "position" is a method that returns what the scanner thinks is the current position. It isn't guaranteed to be 100% accurate all the time as the scanner doesn't itself keep track of the length of lines it has scanned past, so it depends on the token being "unget" to contain a postition for that (see scanner.rb - not going in depth into that). 

In reality it works well enough to give reasonable error reporting during parsing. But what about errors identified in later stages?

That's where E[] comes in. It's a shortcut to create an instance of a subclass of Array: AST::Expr. We'll be using this class to carry additional annotation of the nodes in the parse tree in later stages. For now the additional element is the position, which it takes from the first argument that either is a position or has one. See ast.rb for the details. 

Compiling the syntax tree

The compiler class itself hasn't changed all that much since last time apart from much better error reporting (see error()). The most significant worthwhile change to look at is the implementation of basic support for instance variables in objects. As for vtables for classes we're taking major shortcuts to start with.

The magic starts in "scope.rb".  Specifically we now keep an array of instance variables we've seen for this current class. This chunk in Scope#get_arg will return [:ivar, offset] where "offset" is the number of the "slot" of PTR_SIZE values starting from the beginning of an instance object that this instance variable belongs to.
    # instance variables.
    # if it starts with a single "@", it's a instance variable.
    if a.to_s[0] == ?@ or @instance_vars.include?(a)
      offset = @instance_vars.index(a)
      add_ivar(a) if !offset
      offset = @instance_vars.index(a)
      return [:ivar, offset]
    end
At the moment the above contains a bug, as the instance variables will actually get added too late for the "new" method to correctly allocate the object of the right size. So in the next batch of changes will be one to "visit" the nodes of a class definition to pre-allocate the instance variable offsets.
This current approach also has another problem: In Ruby you can dynamically add instance variables to an object. There's no guarantee that all (or any) of the instance variables identified will actually be added to any specific instance. Here the compiler will need to make a trade-off:

We can allocate space for all the instance variables we see (we still need a way to handle dynamically allocated ones) or we can pick a subset that is likely to be present on most objects (if it's set in "initialiaze" for example). For dynamically allocated ones we can set aside space for pointers to a hash table to hold them. We can also use a level of indirection to "pack" the instance variables more to avoid having to resort to the hash tables as often as we otherwise might do. There's no guarantee any specific one of these strategies will be best for exactly your app, so there's room for switches here, or have the system dynamically optimize the choices at runtime (at the cost of more complexity). For now, though, we'll just allocate sufficient space for everything we see, and ignore dynamically allocated ones. Then later, we'll add support for handling dynamically allocated ones, and then we can start looking at optimizations. As usual we take the shortcut that bring us functionality first.

The compilation of the instance variables is pretty trivial - load and saves happens in compiler.rb, in compile_assign and compile_eval_arg respectively. Here's what we do in compile_assign:
    if atype == :ivar
      ret = compile_eval_arg(scope,:self)
      @e.save_to_instance_var(source, ret, aparam)
      return [:subexpr]
    end
Emitter#save_to_instance_var just stores the value to the appropriate offset into the object.

The support libraries

We've barely started on the support libraries, so this will be short, but it's worth taking a brief look at what's in the main tree so far, not least because it demonstrates how I intend to proceed - even core elements of the object model, such as the Object and Class classes will largely be written in Ruby, and make use of only very limited bootstrapping help from the compiler. Only a small number of elements will require us to "dip out" of pure Ruby, and in those cases we'll as much as possible rely on the s-expression syntax that allow direct access to the AST. Doing C etc. is a last resort or temporary workaround only - a goal of this project is minimal dependencies.

Here's what we use to bootstrap Class:
def __new_class_object(size)
  ob = malloc(size)
  %s(assign (index ob 0) Class)
  ob
end

class Class def new # @instance_size is generated by the compiler. YES, it is meant to be # an instance var, not a class var ob = malloc(@instance_size*4) %s(assign (index ob 0) self) ob end end

Calls to __new_class_object is called by the bootstrap code generated by the compiler (in compile_class). It just uses the C-library's malloc() for now, and assigns a pointer to the class Class in the first slot.
Class#new is implemented so that it will take the instance size from each subclass (that's why we use an instance variable of the class instead of a class variable - class variables are shared across subtrees of the inheritance tree in Ruby), and then does more or less what __new_class_object does.

It's worth noting that we make this raw class reference available as the instance variable @__class__ in the objects. Why not as "class"? Well, the Ruby object model is finicky. Modules and meta-classes are inserted into the inheritance chain (for good reason - it makes things a lot easier), but the are "hidden" from you when you access self.class. So @__class__ is an implementation detail that should not be dependent on other than to help us implement the core classes. The same is true for "@instance_size" in class Class.

Final words (for now)

As usual, if you have questions or comments, they are welcome. Get all of the code from Github. The state of the code when writing this can be found at tag "step-20c".

And feel free to get in touch to contribute, or just fork the code and hack away.



There's already a ton of changes since I started writing this part. The next milestone is to get the compiler to compile itself - that's probably 2-3 articles away. The first step will be to get it to generate code that will link, but I fully expect it to have plenty of problems that will prevent it from running... An article on debugging the code generation is likely to be forthcoming at that stage...
Once it can compile itself, my next goal is to work on getting it to compile mspec, in order to get a measure of what is missing from Ruby. It will require quite a few fixes to the parser.





2009-04-19 17:45 UTC The problem with compiling Ruby

Compiler technology is one of my hobbies, most recently satisfied with my project to write a series of post of how to write a (Ruby) compiler in Ruby. Since I really like Ruby, it's natural that I've done a fair amount of thinking about compiling Ruby, and what the problems with that are, and I decided it was time I wrote some of them down along with some thoughts on how they can be solved -- ideally without changing Ruby. Even more so since I decided I DO want to take the leap towards making my compiler project focus on actually compiling Ruby, and not just something somewhat like Ruby. 
All of the issues here affect a traditional "ahead of time" compiler - one that takes source in and spits out a binary, but many also affects JIT's.

I'll start with the problems, in no particular order:

Problem #1: No delineation of compile time and runtime

Many scripting languages fall in this category, but Ruby does it one better, by making the class definitions executable. Where a compiler for a language like C++ can do a lot of work to analyze and optimize code at compile time, and can build most structures related to classes at compile time, in Ruby it may in some cases be hard to avoid executing code at runtime to determine what the class will look like.

A potential Ruby compiler also needs to solve the issue of what files are compiled once at compile time and linked into the application, and what files are evaluated (and possibly JIT compiled) at runtime. This is a possibly tricky tradeoff - reloading code has become a common solution for Ruby web applications to avoid shutdown and restarts, for example, but there are no formal hints in Ruby code that tells the application what may or may not be attempted reloaded later. 

Reloading in Ruby tends to depend on the ability of the Ruby object model to replace classes and methods, so the most natural solution for handling reloading may simply be to compile the classes in statically but retain this ability. That still doesn't solve the first part of the problem, though: If my app starts with "require 'foo'", do I expect "foo" to be loaded when compiling or each time? You probably wouldn't be surprised if the compiler loaded it once. But what if it starts with code that reads the contents of a directory, and require's each file in turn? That one is trickier - some scripts may use that method as a crude "plugin" mechanism, for example.


My compiler project has reached the point where dealing with something supposedly simple like "require" is now actually a stumbling block. Even something "simple" like Mspec from Rubyspec actually depends on Ruby code being executed to determine what files to require, and I have to decide whether to for now use "hacks" to get around it (Mspec and a lot of other Ruby code use a small set of common idiomatic ways of modifying the load path for the code being required - a tiny interpreter subset could take care of the basic cases) or do it "properly" (compiling to shared objects and use dynamic loading at runtime, but this kills a lot of optimizations; or JIT compilation of files that get required this way; or even JIT compilation as a means of executing the code to determine the files to require and then requiring them statically). Or I could just skip the problem for now, and just handle the specific case where files are required using a static string and/or add a hack that will use a substitution table or regexp to rewrite the require's (eww...)

Anyway, the point is that this is a big hurdle for anyone hoping to write an ahead of time compiler for Ruby without resorting to JIT compiling most of the program anyway. Part of the appeal of ahead of time compilation is to avoid JIT compilation (and avoid lugging around a ton of source files), so while supporting JIT compilation for pathological cases is good and/or necessary depending on how you look at it, for an ahead of time compiler to make sense you want to make that a "last resort" if there is no sensible alternative.

Problem #2: A very costly method dispatch


Let me count the number of ways Ruby makes method dispatch expensive in a naive implementation (see also my post on the Ruby Object Model as implemented in MRI)

  1. A Ruby method call involves first identifying the class of an object. This either requires following the "class" pointer of an object, OR a "decoding scheme" (as used in MRI) to allow small objects like Fixnum, True, False, Symbol and Nil to be encoded without a pointer.
  2. Then we must follow the "super" chain from class to class, potentially all the  way to Object, to determine which class can satisfy the method call. Then if that fails, it needs to do the same for #method_missing.
  3. Because the type of the object stored in any variable is unknown, a compiler can not assume anything about the type of an object based on where it is stored. Unlike, say, C++, where the compiler will happily assume that a Foo * will hold a pointer to an object of class Foo or a subclass, and treat it accordingly, a Ruby compiler could not. This also largely affect inlining of methods as a viable optimization.
  4. .. and anyway, since Ruby classes are open, users can add, alias and remove methods at will (with some minor restrictions), so a Ruby compiler largely can't assume a method stays the same through the lifetime of the object.
  5. ... even worse, thanks to meta-classes in Ruby, there may conceptually be a new "almost" superclass inserted for an object.

Luckily, there are a number of solutions that can vastly improve on the naive implementation of Ruby method dispatch. Unfortunately most (though not all) of them make the compiler and runtime more complex.

Problem #3: Meta-programming

It's alluded to above. Ruby allows extensive modification of classes and even objects at runtime. Defining new methods, inserting new modules or otherwise messing with the structure makes static analysis to optimize other problematic aspects of compiling Ruby very hard. 

Take even something simple like integer arithmetic. Never mind that Ruby automatically handles overflow and turns Fixnum's into Bignum's, which means that even if you do try to make it cheaper than a method call, you first have to check whether you deal with a Fixnum (which is not an ordinary object) or a Bignum (which is an ordinary object), but if both values ARE Fixnum's you still face the uncertainty of whether or not a method of Fixnum has been replaced by unscrupulous monkey patchers...

Problem #4: No statically defined set of instance variables

In a language like C++, object size is kept down because the compiler knows at compile time what instance variables exist for any given object. As a result, it can pack them tightly together. In Ruby, theoretically an object can have new instance variables show up at any time, through mechanisms such as #instance_variable_set, #instance_eval etc.. MRI solved this by making the instance variables stored in a hash table, which is incredibly wasteful for otherwise small objects. Ruby 1.9 has reduced this impact somewhat by storing up to 3 instance variables in the object itself (see the Ruby1.9 section), and then falling back to an external structure.

Luckily, in practice most objects will have a small and mostly statically determinable instance variable set, and some of the same methods that can speed up method dispatch can be used to handle this more effectively as well.


Problem #5: Method visibility, default arguments and method arity


In C++ for example, you can easily know at compile time if a method is private or protected. In Ruby, since you don't know the class of the object you will be passed, you can't know that until runtime. This means this needs to be checked at runtime as well. We could imagine checking this only in private or protected methods, since use of them is relatively rare in Ruby code, but that's not easy either: 

Private methods in Ruby require self as an explicit receiver, which means they are only allowed to be called from within a method, and on the same object as the method is operating on. Slight little problem... How will the executed method know that this is the case? And then there's the ways of bypassing the check. An alternative for handling this is to pass along a flag saying "I promise I was called with self., honest!" (or used #__send__ etc. to bypass the check), but handling that without imposing overhead on calls that don't need it is non-trivial as well.

Default arguments, and more generally method arity, suffers from the same problem. In C++ the caller can take responsibility for initializing default arguments when not providing values for all of the arguments. In Ruby, the compiler won't know when you are calling a method that has default arguments vs. one that just have fewer arguments (for that matter, you don't know for sure if the number of arguments is right at all), and so this has to be handled by the callee. That means the callee needs to get an argument count, or have another method of determining how many arguments were passed. That's not to bad. 

But the callee then also have to contain the logic to initialize missing arguments. That's messier as it means either manipulating the stack frame, handling multiple ways of accessing an argument, making the arguments a "real" Ruby array and push the default arguments onto it, or otherwise moving arguments around to get them in a consistent location. There are simple ways of doing this (shove it all into a "real" Ruby Array object for example and convert the default argument initializers into the equivalent of ||= calls), and there are faster ways (keep things on the stack as much as possible, possible mess with the stack frame, and only convert to a Ruby Array object for methods where it's actually treated as one).

Solutions


Below are some of my thoughts on how to address the bigger ones of these problems.

Solution #1: Speeding up method dispatch with a multilevel approach to dispatch

In "Protocol Extension: A Technique for Structuring Large Extensible Software Systems" (1994; original Postscript file; PDF from Citeseer X), Michael Franz, presents a method for "protocol extension" in Oberon - a way of relatively effectively adding methods at runtime. The basic idea is that method changes (addition, overriding or removal) are rare, and method dispatch is frequent, so it makes sense to do more work when modifying the methods than it does when calling them. 

Franz suggested that the "vtable" for each subclass is made to contain pointers to all the methods for the entire hierarchy. A method that is overridden in a subclass has its method pointer overwritten in that subclass and all descendants of that subclass that has not overridden it themselves.

The problem he noted is that in a big class hierarchy the vtables for each class may grow prohibitively large, while remaining mostly unchanged. This can be counteracted in a number of ways - one of them being splitting the vtable into chunks for "interfaces", and making each vtable a set of pointers to vtables for interfaces.

Another way is this:

The compiler (or compiler-writer) can make educated guesses about which methods are most likely to exist in all classes, and which methods are less likely to get called:

  • Methods in classes high up in the hierarchy are more likely to remain present. In Ruby methods on Object will exist in almost all classes (the only exception will be cases where the methods have been explicitly removed or aliased away).
  • Methods that are referenced in inner loops may be more important to make fast to call than others.
  • Analysis of number of call-sites can also give an indication of how frequent a method will be called.
Methods that are present in more than a certain threshold of classes, or that are judged to be particularly performance sensitive can be allocated an offset in a per-class vtable. Methods that are slightly less likely can be grouped into "interfaces", and require a one level indirection. As a last resort the implementation can be forced to fall back on a #send call that does lookups the way MRI does now. But see solution #2

Solution #2: Polymorphic Inline Caches

Otherwise expensive method lookups can be cached. This caching can even be done "inline" in he code path by dynamically inlining the code for the classes that are seen at a call site in practice. Polymorphic inline caches were introduced in Optimizing Dynamically-Typed Object-Oriented Programming Languages with Polymorphic Inline Caches" by Urs HlzleCraig Chambers, and David Ungar.


This does not remove the problem of potentially expensive lookups, but it drastically reduces the impact in cases where the type used is relatively stable (which in practice is the common case - the same variable is rarely assigned more than a handful different types).

The key to PIC for languages like Ruby is that you still need to check the type, and you either need to invalidate the old type when a class is modified, or you need to invalidate the caches. Maintaining that logic can be complicated (compare to the approach in solution #1, where all that is required is updating the vtables). PIC's can be combined with the approach above: Solution #1 provides cheap lookups, but #2 can still allow inlining of whole methods where appropriate, in a way that is safe.

Solution #3: Trace Trees


Dr. Michael Franz and Andreas Gal have been working on a technique called Trace Trees. Possibly the best introduction is in a blog post by Andreas Gal, but the papers are also fairly accessible.

Trace trees is the "hot new thing" - you'll find it used in the new JS JIT for Mozilla for example ("Tracemonkey"). The short description is that trace trees consists of tracking the execution of bits of an application - typically in a bytecode interpreter, and when a certain part is executed often enough, you "trace" the code execution  and create a "tree" of code fragments. You use this to identify loops or frequently executed code paths that are optimized (in the bytecode interpreter case, this would involve JIT compilation), and protected by a "guard" that verify whether to keep executing the new native generated code path.

While the approach is intended  for a bytecode interpreter, it has scope for being used to handle dynamic runtime optimization and inlining of specific code paths generated by an "ahead of time" compiler as well. The compiler can generate the best code it can, but inject timing and tracing code where appropriate to allow it to use information gathered at runtime to inline specific method calls into inner loops etc.. An in-between alternative is to let the AOT compiler use profiling data gathered from past runs to built trees on subsequent compiles (this approach is also suggested in the papers on polymorphic inline caches)


Solution #4: Dynamic object packing (for instance variables)

We've already explored the building blocks for handling the troublesome issue of dynamic instance variables above. There are two parts to that problem: Quick access, which is hindered by having to resort to schemes that requires expensive instance variable lookups, and space which is hindered by a potentially dynamically changing set of instance variables.

First of all it is possible to apply a similar analysis to that suggested for vtables: Identify the most likely set of instance variables for objects of a class. Assign specific offsets for those instance variables, and compile code using static offsets for code internal to the class.

For instance variables that are not guaranteed to be ever used, there's a value judgement: If information is available to determine a rough likelihood you can use a cutoff to decide which to always include in the object. This has the potential for huge space overhead if the guess is wrong. The alternative is to fall back on a hash table referenced from the object to handle additional instance variables. This is costly in space and time if the number of objects with extra instance variables is high.

However the latter method can be combined with the earlier solutions to reduce the time overhead.

To reduce space usage further, we can dynamically pack the object: 

We can potentially cooperate with the garbage collector to identify pointers to an object, and then reallocate the object elsewhere and change the layout, and then use tracing similar to the one mentioned above to created optimized code paths that rely on the new static layout. If we don't want to move all objects, we can handle this by inserting "proxy classes" and making specific subsets of these objects instances of specific proxy classes. In fact, Ruby already sort-of does this by inserting proxies for Module's that are "hidden" for normal Ruby code when walking the inheritance chain.

This is a fairly complex solution, but one that can potentially allow very tight packing of objects, and avoiding a significant percentage of hash table lookups for instance variables. Combined with trace trees and PIC's, a significant part of the code overhead for method accessors used from outside the class can also be removed.






2009-04-16 22:56 UTC Writing a compiler in Ruby bottom up - Milestone: It can parse itself...

Yeah, I know, it's a long time since the last part. That doesn't mean things have stopped, though I've been extremely busy.
As always, if you don't know what I'm talking about, take a look at the series.

To see what I've been up to, take a look at the Github repository

I'm not going to write a full part right now, but rest assured more is coming soon. Apart from being busy, a major reason I've held off is that I've been working towards this milestone:

The parser can now parse all of the *.rb files that make up the compiler itself. 
Note that this does NOT mean that it can compile itself, but it can parse itself into an AST. I'll cover some of the work needed to get there in the next part, and then I'll start working on extending it so it can compile larger and larger parts of itself. I'll break it into chunks as that is a fairly substantial amount of work (it will need a lot of runtime support such as partial implementations of Array, Hash, String, IO etc.)

Now, if you look at the commits, and think about the implications of the above I'm assuming you have a nagging suspicion that I'm moving towards actually writing a Ruby compiler (as in a compiler that compiles Ruby)...

All I'll say is that I am considering it. There are a lot of problems to compiling Ruby, many of which I'm going to address in a separate post, but let me leave you with an example to ponder... How do you handle this admittedly contrived (and untested) code sensibly in a compiler:
foo =  `wget some-url`
foo.each { |line| require line.chomp }

class Bar puts "I'm inside Bar" end


The point is that Ruby has no real distinction between "load time" and runtime, and so any Ruby compiler has to make a lot of tradeoffs (in other areas as well): Do you satisfy require's at compile time or at runtime? Do you execute any code at compile time (some Ruby code manipulate the load path for example, or do weird things to decide what to require, though likely not as bizarre as the example above). Do you execute code in class definitions or generate code to build the classes at runtime?
Some decisions are likely to hurt compatibility; some are likely to require significant tradeoffs between performance and compatibility.

So while I'm heading in the direction of "sort of" compiling Ruby, I've not yet decided how much importance I'll put on completeness and compatibility with arbitrary Ruby code vs. creating a compiler that does things the way I think makes most sense (if you look at the parser, you'll for example start seeing how beastly the Ruby grammar an be, though I hope to be able to clean up the parser quite a bit again once I get a bit further).

Why compile Ruby at all, subset or not, you might ask?

I have a few reasons:

  • Distributing a single binary is practical at times (this indicates at least giving the option of satisfying require's at compile time)   
  • NOT having to distribute a whole runtime environment of an interpreter/vm and libraries can make things a lot easier.  
  • Startup time - It's hard to beat a static binary that only requires relocation for startup time; I've had external Ruby libraries add seconds to startup time for fairly simple apps because of lots of dependencies...  
  • A single binary is a simple way of guaranteeing that the libraries the app was tested with are the ones that actually get used on the client system.  

That doesn't mean I don't think there's plenty of space for VM's/interpreters - there's a lot to be said for the simplicity and flexibility of it. But a compiler allows a language to go a lot of places depending on a VM is unlikely to.

More importantly, I very strongly believe in the value of self-hosted systems. A compiler Ruby or a substantial subset of Ruby with proper support for interfacing with the host platform also allows Ruby VM's or interpreters to be written in (nearly) Ruby.. When I like a language I don't like having to dip down into another language to improve its implementation. 


That said, I also have this urge to have a platform for experimenting with improvements to Ruby, and some of my focus going forwards will be on making the compiler easy to extend



2009-03-10 23:36 UTC Writing a compiler in Ruby bottom up - step 19

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

If you've been following the commits to the Github repository, you've already seen this go in... Specifically, this was the state as at the end of this post. Here's finally some explanation.

The Object Model

Every object oriented language implements it's own "brand" of object model. For this compiler I want to eventually approximate the Ruby object model. The Ruby model is, however, extremely dynamic, and extremely dynamic translates to hard to compile efficiently (a post on the problems facing compilation of Ruby is upcoming).

For most of this series I've ignored performance issues, but only because they've not been structural or really significant. The code we generate is messy and ugly and inefficient, but because of lack of optimization more than because the concepts are unsound.

I will leave the details of the problems with compiling the Ruby model for my later post, but lets boil it down to something very simple. Two criteria for making something easy to compile efficiently:

  • Low cost of determining the type of an expression at runtime.
  • Low cost of determining the method to call for a specific (type,method name) pair.
A naive implementation of the Ruby model falls down on both of these. So lets take a step back and see if we can approximate the dynamic features by layering some relatively simple approaches. This is not going to happen in one post, but by the end of this post we will have the infrastructure in place and be able to call methods.

First, lets take a look at some C code. C code?!? Every C-programmer that's written a huge project is likely to have come across or implemented an object system in C. It's easy thanks to ease of bit-fiddling and function pointers, so it makes it easy to illustrate the concepts. The following two sections are pretty basic if you already understand how object orientation in static languages is usually implemented.

A basic "static" object model


If you work in a statically typed language, efficient object orientation is pretty much trivial. The simplest example is a "non-virtual" member function - A function that is ALWAYS the one being called when you call a method of it's name on an object stored in a variable of a specific type. In C++ this would be:

class Foo {
   void bar() {
       puts("Hello world");
   }
};

int main() { Foo test;

test.bar(); }


Well, this isn't really anything but a function. In C we can do the same thing easily:

struct Foo {
};

void Foo_bar(Foo * this);

int main() { struct Foo test; Foo_bar(&test); }


There's no real point in having the "this" pointer, but implicitly it's there in the C++ code, and in the C code we'd need it the second we want to add instance variables.

A non-virtual member function in a statically typed language is nothing more than a function where "this" (or "self" in other languages) is passed as an argument.

A static model with virtual methods


The next step up is allowing "virtual methods". While non-virtual methods can't be overriden - the method that gets called is tied to the type of the pointer, not to the type of the object - virtual methods can. If you've ever had a programming class with any language like C++ or Java you should be familiar with this.

But how is it implemented? Enter the "vtable". A vtable is a table of pointers to virtual methods. The compiler will decide on the layout at compile time, implementing something like this (details omitted):

struct Foo;

struct Foo_vtable { void (* bar) (struct Foo * this); };

struct Foo { struct Foo_vtable * vtable; };

/* ... later after setup: */ struct Foo * test = new_Foo(); test->vtable->bar();

Foo_vtable's "bar" function pointer would be filled in with the address of Foo_bar from earlier, and methods can be called indirectly via the vtable. In Ruby the equivalent - sort of - is the "klass" pointer each object has (reachable inside Ruby - sort of - with #class). The indirection adds a small cost: You need to load the address of the method indirectly from an offset into the vtable, but with it comes great flexibility.

Typically a purely C based OO system would hide the "vtable" bit by offering wrappers, so you could do Foo_bar(test) just like before (the actual implementation would then be named something else), but under the hood it'd do the same.

The problem of the static model in a dynamic world

The approach above works great if all type information is known at compile time. It produces efficient code. But already in the case of virtual methods there's an overhead: The extra lookup. Theoretically this lookup can be omitted if the compiler can know at compile time which implementation can be called - if it can determine statically which specific class an object belongs to, rather than a shared ancestor. In practice few if any compilers do this.

But languages like Ruby takes this to a whole other level. In Ruby you can add methods, rename methods, remove them, import modules and more. Each time this happens the actual method that gets executed when you run foo.bar can change.

But more importantly, in C++, if I have the variable "foo" that I know contains a pointer to an object, I can know enough about the type of that object by lookup back at the type declarations. Specifically I can know the vtable layout, and so I can turn it into a simple array lookup to find the address of the method.

In Ruby that's simply not possible in most cases ("most" because static analysis could tell you the initial type of many objects, however even then things could quickly change at runtime) since a variable "foo" can hold an object of ANY type, and they may not really be related other than both being instances of subclasses of Object.

The original Ruby implementation solved this by checking each class in turn, starting with the objects class, and then the superclass of that class and so on, and using a hash table to match a method name to a method body for each class. This is slow.

Minimizing the overhead

The first stab at the object model for my compiler will use a vtable. In fact, the version you're about to see doesn't support inheritance yet, so a vtable is trivial. However, let me describe how we'll address inheritance and working towards a model more or less as dynamic as Ruby's.

First of all we should realize that most objects share a lot of methods in any languages like Ruby where all objects inherit from a common base class. All of these methods are in almost all classes because they're defined in Object (the exception being if a subclass specifically remove a method):

irb(main):003:0> Object.new.methods.sort
=> ["==", "===", "=~", "__id__", "__send__", "class", "clone", "display", "dup", "eql?", "equal?", "extend", "freeze",
"frozen?", "hash", "id", "inspect", "instance_eval", "instance_of?", "instance_variable_defined?", "instance_variable_get",
"instance_variable_set", "instance_variables", "is_a?", "kind_of?",  "method", "methods", "nil?", "object_id",
"private_methods", "protected_methods", "public_methods", "respond_to?", "send", "singleton_methods", "taint",
"tainted?", "to_a", "to_s", "type", "untaint"]

Since we know this is the case, there's no reason to do a complex lookup if we're willing to a little extra work when creating classes and overloading methods:

  • When creating a new subclass, we copy the vtable of the parent.
  • When overloading a method, then the vtable entry for that method for the class is modified, and so are the vtable entries for all subclasses that haven't provided an overloaded method themselves.
This is actually pretty simple logic to implement - we just need to know what subclasses a class has.

One problem though: Adding new methods, and what about handling a larger set of methods?

When we add a new method we need extra vtable slots, and that's obviously not going to scale, since every class in the same class hierarchy need to have a slot for every method in the hierarchy. In a language like Ruby where ALL classes inherit from a single root - Object in Ruby's case - that means ALL classes will have a vtable slot for every method in the system. We face the problem of what happens if you have a 100 classes, all adding it's own methods, and all using different names.

Well... Oops. Actually it's not so bad: Since we can't know of all the methods at compile time we set a maximum size of the vtable. Within the allowed space we pack the methods that are shared across the largest set of classes. Then we just need a method for handling "spill". Worst case? We do like Ruby and reserve space for a hash table for each class - we get the best of both worlds: Very common methods can be looked up extremely cheaply, while the rest needs to resort to more work (but even for that there are lots of techniques we can use, such as various types of method caches, and something called "trace trees" you might have heard about if you're following what's going on with javascript implementations for modern browsers)

If a method is ... missing from a class, whether deleted or just not there, we simply add the address for the method_missing for that class (or the global one).

For now we'll just add the vtable. Then we'll worry about handling "spill" and inheritance.


Implementation

An example, and bootstrap code


Here's what we'll make the compiler handle:

def __new_class_object(size)
  ob = malloc(size)
  %s(assign (index ob 0) Class)
  ob
end

class Class def new ob = malloc(4) %s(assign (index ob 0) self) ob end end

class Foo

def bar puts("test") self.hello end

def hello puts("Hello World!") end end

%s(let (f) (assign f (callm Foo new)) (callm f bar) )

The s-expressions at the end is a stupid workaround - as it stands the compiler doesn't handle nested local variable scopes, and so we can't blindly wrap one around the entire program, and we don't have parser support for adding explicit let's (I don't want it - I like Ruby's approach to inferring new variables).
The other s-expressions are workarounds too - we don't have anything equivalent to "index" that can be used on the left hand side of an assign. Anyway, the first part here shows part of the bootstrapping of the object model. "__new_class_object" is used to return raw slap of memory with a pointer to the class "Class" as the first thing.

Then we actually define the class Class, and implement a rudimentary "new" method. As you can see, for now it just allocates a 4-byte "object" and assigns "self" to the first 4 bytes, and then returns the new object. No call to "initialize" like in Ruby etc.

Apart from some very basic initialization which you'll see soon, I'd like to implement as much as possible this way, without adding runtime libraries written in a different language.

The rest is just a simple test. Note that for now I'm requiring an explicit receiver (hence "self.hello" instead of "hello").

Updating the parser


Here's the Github commit for the first parser change.

We're not doing much. Just making the parser handle this grammar fragment:

class ::= "class" ws* name ws* exp* "end"

Simple and straightforward (and nothing near as expressive as Ruby - note the absence of subclassing for starters).

The commit really does speak for itself, so I won't go into much more detail, other than to say I just noticed something I didn't like - namely that I patched "class" into "exp", and thus the grammar allows classes within classes - ignore that... If there's anything you don't get, feel free to ask in the comments.

The one other thing worth noticing is that I've added a "." operator:

 "."  => Oper.new(90, :callm,  :infix),

So now we need to add the :class and :callm constructs to the compiler proper, and that's a bit more involved.

Adding global constants

First things first. To have a way of storing the pointers to the class objects we need some form of global storage, and Ruby style global "constants" seem like an ok way of doing it. So take a look at scope.rb, which is where the scope classes have been relocated, and which holds this:

# Holds globals, and (for now at least), global constants.
# Note that Ruby-like "constants" aren't really - they are "assign-once"
# variables. As such, some of them can be treated as true constants
# (because their value is known at compile time), but some of them are
# not. For now, we'll treat all of them as global variables.
class GlobalScope
  attr_accessor :globals

def initialize @globals = Set.new end

def get_arg a return [:global,a] if @globals.member?(a) return [:addr,a] end end

#output_functions is then updated to do this:

   def output_constants
     @e.rodata { @string_constants.each { |c,l| @e.string(l,c) } }
+    @e.bss    { @global_constants.each { |c|   @e.bsslong(c) }}
   end

Also take a look at "emitter.rb" for #bss and #bsslong, and search compiler.rb for ":global" to take a look at how the space is allocated (#bss/#bsslong) and how data is moved into and out of the constant references.

The juicy bits - :callm and :class

If you looked through scope.rb you might have already noticed ClassScope and VTableEntry. If not, do that now - they're pretty simple, and for now mainly ensure we have a place to stuff the vtable information. Lets start with :class, which is used to actually define the class.

 def compile_class(scope,name,*exps)
    @e.comment("=== class #{name} ===")
    cscope = ClassScope.new(scope,name,@vtableoffsets)
    # FIXME: (If this class has a superclass, copy the vtable
    # from the superclass as a starting point)
    # FIXME: Fill in all unused vtable slots with __method_missing
    # FIXME: Fill in slot 0 with the Class vtable.
    exps.each do |l2|
      l2.each do |e|
        if e.is_a?(Array) && e[0] == :defun
          cscope.add_vtable_entry(e[1])
        end
      end
    end
    @classes[name] = cscope
    @global_scope.globals << name
    compile_exp(scope,[:assign,name.to_sym,
          [:call,:__new_class_object,[cscope.klass_size]]])
    @global_constants << name
    exps.each do |e|
      addr = compile_do(cscope,*e)
    end
    @e.comment("=== end class #{name} ===")
  end

This is hooked into #compile_exp in the usual way. It's quite straightforward. The goal is to compile the class definition straight into wherever it is found in the form of the code needed to allocate and initialize the class object with the vtable. First we create a new class scope, and we'll look at that later. Then we loop through the expressions inside the [:class, ...] array to see which of them are :defun's. For each of them we add a vtable entry to the class scope.

Then we add the class to a global hash, as well as add a global constant for the class. Then we call #compile_exp with an expression to call one of the bootstrap functions above - __new_class_object, with the size of the class object (calculated based on the vtable) and assigning the result to the freshly created global constant.

We then go through the expressions again, and compile the code contained in the class. We'll look at the modified #compile_defun next:

  def compile_defun scope,name, args, body
    # Ugly. Create a default "register_function" or something. 
    # Have it return the global name
    if scope.is_a?(ClassScope) 
      f = Function.new([:self]+args,body)
      @e.comment("method #{name}")
      fname = @e.get_local
      scope.set_vtable_entry(name,fname,f)
      @e.load_address(fname)
      @e.movl(scope.name.to_s,:edx)
      v = scope.vtable[name]
      @e.addl(v.offset*Emitter::PTR_SIZE,:edx) if v.offset > 0
      @e.movl(@e.result_value,"(%edx)")
      name = fname
    else
      f = Function.new(args,body)
    end
    @global_functions[name] = f
    return [:addr,name]
  end

As you can see, when given a class scope, this function will do more work: It'll create the function with a "local" name, just like we do with lambda's. Then the local name is added to the vtable entry so we can refer to it later.

Then we emit code to load the address of the function into %eac, and the address of the class object to %edx. We add the offset of where in the vtable the address of the method should be stored to %edx, and finally we save the address of the method (held in %eax) into the address pointed to by %edx. This is all very ugly and should be refactored so that the compiler code is less dependent on the architecture - we'll do that later.

Then we finish up just as before.

So, that's what it takes to define the class. Now we need to be able to compile method calls. Enter :callm:

  def compile_callm scope,ob,method, args
    @e.comment("callm #{ob.to_s}.#{method.to_s}")
    args ||= []
    @e.with_stack(args.length+1,true) do
      ret = compile_eval_arg(scope,ob)
      @e.movl(ret,:eax) if ret != :eax
      @e.save_to_stack(:eax,0)
      args.each_with_index do |a,i| 
        param = compile_eval_arg(scope,a)
        @e.save_to_stack(param,i+1)
      end
      @e.movl("(%esp)",:eax)
      @e.movl("(%eax)",:edx)
      off = @vtableoffsets.get_offset(method)
      raise "No offset for #{method}, and we don't yet implement send" if !off
      @e.movl("#{off*Emitter::PTR_SIZE}(%edx)",:eax)
      @e.call(:eax)
    end
    @e.comment("callm #{ob.to_s}.#{method.to_s} END")
    return [:subexpr]
  end

Eww.. Compare to #compile_call and it looks pretty nasty, but it's not that horrible. First we compile the expression provided for the object we call the method on, and save that to the allocated stack space. In fact what we're doing is allocating "self" (notice the "+1" added to args.length in the #with_stack call). Remember we've already added "self" to the list of arguments for the function in #compile_defun

Then we compile the arguments as normal.

After that we get hold of "self" again, and move the address held at the first 4 bytes of the object (which is our vtable pointer) to %edx. Then we retrieve the vtableoffset for this method, and copy the address of the method from the vtable into %eax, and finally we call it, just as we would for #compile_call.

This isn't a very efficient implementation - we should avoid the extra stack access to get the method address, but we can optimize that later.



2009-02-23 03:06 UTC Writing a compiler in Ruby bottom up - step 18

This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Plugging in an operator precedence parser

The code as at the end of this part is available here.

A while back I wrote a post about writing a simple operator precedence parser. I did that with this in mind. Instead of writing, and maintaining a set of grammar rules for each type of expression involving operators, and implement them, I've plugged in an operator precedence parser instead. You may want to refer back to that post - I won't go into detail about this parser component itself since most of the inner workings is covered there, so this will be another short part and will rely heavily on you having read that post.

To illustrate why I prefer an operator precedence parser, lets look at a small grammar snippet that gives an idea of how it would be implemented in a recursive descent parser. 

expr    ::= mulexpr (("+"|"-") mulexpr)*
mulexpr ::= primary (("*"|"/") primary)*
primary ::= number | "(" expr ")"

Now this looks simple, and it is: The looser an operator binds, the closer to the "top" of the expression it is, so the parser will look for tighter binding operators first, by following the rules down. 

But it gets tedious, especially when you convert it to code by hand. More importantly you need to add this to the grammar for each and every priority level in the grammar. With an operator precedence parser, you instead get a table of priorities - changing how the operator binds is just a matter of changing that table. Lets look at the table for my parser as it currently stands:

Oper = Struct.new(:pri,:sym,:type)

Operators = { "," => Oper.new(2, :comma, :infix),

"=" => Oper.new(6, :assign, :infix),

"<" => Oper.new(9, :lt, :infix), ">" => Oper.new(9, :gt, :infix), "==" => Oper.new(9, :eq, :infix), "!=" => Oper.new(9, :ne, :infix),

"+" => Oper.new(10, :add, :infix), "-" => Oper.new(10, :sub, :infix), "!" => Oper.new(10, :not, :prefix),

"*" => Oper.new(20, :mul, :infix), "/" => Oper.new(20, :div, :infix),

"[" => Oper.new(99, :index, :infix), "]" => Oper.new(99, nil, :rp),

"(" => Oper.new(99, nil, :lp), ")" => Oper.new(99, nil, :rp) }

Notice how "," is an operator - this is to handle function calls without any special handling in the shunting yard class (instead handling it by flattening the tree in the TreeOutput class) - adding special handling and avoiding any special checks in the TreeOutput class would also work. I didn't do any real analysis to find out which is best.

If you wanted to, you could mechanically convert this table to grammar fragments like above fairly easily, and automatically generating the parser code wouldn't be hard either. But there's not really much point - as my earlier post showed, implementing the Shunting Yard algorithm for an operator precedence parser is fairly simple, and in this case it drops straight, and adding a new operator to the parser is more or less a matter of adding to this table.

The next big chunk that needed to be added was a tokenizer, as the shunting yard class presented in the earlier mentioned post requires a stream of tokens rather than characters. The tokenizer needs to recognize the type of token, and then attempt to read it from the scanner. Here's the tokenizer as it stands at the moment:
  class Tokenizer
    def initialize scanner
      @s = scanner
    end

def each while t = get yield t end end

def get @s.nolfws case @s.peek when ?" return @s.expect(Quoted) when ?0 .. ?9 return @s.expect(Int) when ?a .. ?z , ?A .. ?Z buf = @s.expect(Atom) if (buf == :end || buf == :def) # FIXME: Make this a keyword lookup @s.unget(buf.to_s) return nil end return buf # Special cases - two character operators: when ?= @s.get return "==" if @s.peek == ?= return "=" when ?! @s.get return "!=" if @s.peek == ?= return "!" when nil return nil else return @s.get if Operators[@s.peek.chr] return nil end end end

Overall it's pretty simple. The one thing worth really paying attention to is how "end" and "def" are treated as terminating an expression. This is an important issue with operator precedence parser intermingled with something else: You need to take care to identify the tokens that means you've reached the end of an expression, and make sure they're not swallowed up by the tokenizer.

The main change from the shunting yard implementation in the earlier post is a slight change to TreeOutput to specifically handle the :call syntax used in the compiler:

   def oper o
      rightv = @vstack.pop
      raise "Missing value in expression" if !rightv
      if (o.sym == :comma) && rightv.is_a?(Array) && rightv[0] == :comma
        # This is a way to flatten the tree by removing all the :comma operators
        @vstack << [o.sym,@vstack.pop] + rightv[1..-1]
      elsif (o.sym == :call) && rightv.is_a?(Array) && rightv[0] == :comma
        # This is a way to flatten the tree by removing all the :comma operators
        @vstack << [o.sym,@vstack.pop,rightv[1..-1]]
      else
        if o.type == :infix
          leftv = @vstack.pop
          raise "Missing value in expression" if !leftv
          @vstack << [o.sym, leftv, rightv]
        else
          @vstack <<  [o.sym,rightv]
        end
      end
    end

Specifically the "elsif" arm which is added to make sure the node comes out as [:call,function name,[arg1,arg2,...]]. We could have output [function name, arg1, arg2] instead, but that has a distinct disadvantage: You could have a function name that would collide with a primitive built into the compiler, and there'd be no end of confusing bugs. Forcing it into the [:call, function name, [args]] style prevents that from occurring.

As for the s-expression parser, this parser component is tied into the main parser by instantiating it and referencing it from a member variable by adding this to Parser#initialize: " @shunting = OpPrec::parser(s)". As an example of how the main parser is changed, consider #parse_condition:

  # condition ::= sexp | opprecexpr                                                                                                                                     
  def parse_condition
    @sexp.parse || @shunting.parse
  end

At the end of this, the "testargs" example now looks almost like Ruby, apart from the "numargs" call, though only superficially so (remember - the language is still untyped; there are no objects or even type information at all; we'll get to a type system and object orientation soon, though we'll start with something much simpler than Ruby):

def f test, *arr
  i = 0
  while i < (numargs - 1)
    printf("test=%ld, i=%ld, numargs=%ld, arr[i]=%ld\n",test,i,numargs,arr[i]))
    i = i + 1
  end
end

def g i, j k = 42 printf("numargs=%ld, i=%ld,j=%ld,k=%ld\n",numargs,i,j,k) end

f(123,42,43,45) g(23,67)

Note that the operator table above is woefully incomplete - it doesn't even cover what our "runtime library" include. I'll add to it gradually.

 



About me

E-mail: vidar@hokstad.com Skype: vhokstad
Twitter: vhokstad
View my LinkedIn profile.

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married and we just had our first child, Tristan Ikemefuna Hokstad.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Twitter Updates

    follow me on Twitter