2009-04-19 17:45 UTC The problem with compiling Ruby

Compiler technology is one of my hobbies, most recently satisfied with my project to write a series of post of how to write a (Ruby) compiler in Ruby. Since I really like Ruby, it's natural that I've done a fair amount of thinking about compiling Ruby, and what the problems with that are, and I decided it was time I wrote some of them down along with some thoughts on how they can be solved -- ideally without changing Ruby. Even more so since I decided I DO want to take the leap towards making my compiler project focus on actually compiling Ruby, and not just something somewhat like Ruby. 
All of the issues here affect a traditional "ahead of time" compiler - one that takes source in and spits out a binary, but many also affects JIT's.

I'll start with the problems, in no particular order:

Problem #1: No delineation of compile time and runtime

Many scripting languages fall in this category, but Ruby does it one better, by making the class definitions executable. Where a compiler for a language like C++ can do a lot of work to analyze and optimize code at compile time, and can build most structures related to classes at compile time, in Ruby it may in some cases be hard to avoid executing code at runtime to determine what the class will look like.

A potential Ruby compiler also needs to solve the issue of what files are compiled once at compile time and linked into the application, and what files are evaluated (and possibly JIT compiled) at runtime. This is a possibly tricky tradeoff - reloading code has become a common solution for Ruby web applications to avoid shutdown and restarts, for example, but there are no formal hints in Ruby code that tells the application what may or may not be attempted reloaded later. 

Reloading in Ruby tends to depend on the ability of the Ruby object model to replace classes and methods, so the most natural solution for handling reloading may simply be to compile the classes in statically but retain this ability. That still doesn't solve the first part of the problem, though: If my app starts with "require 'foo'", do I expect "foo" to be loaded when compiling or each time? You probably wouldn't be surprised if the compiler loaded it once. But what if it starts with code that reads the contents of a directory, and require's each file in turn? That one is trickier - some scripts may use that method as a crude "plugin" mechanism, for example.


My compiler project has reached the point where dealing with something supposedly simple like "require" is now actually a stumbling block. Even something "simple" like Mspec from Rubyspec actually depends on Ruby code being executed to determine what files to require, and I have to decide whether to for now use "hacks" to get around it (Mspec and a lot of other Ruby code use a small set of common idiomatic ways of modifying the load path for the code being required - a tiny interpreter subset could take care of the basic cases) or do it "properly" (compiling to shared objects and use dynamic loading at runtime, but this kills a lot of optimizations; or JIT compilation of files that get required this way; or even JIT compilation as a means of executing the code to determine the files to require and then requiring them statically). Or I could just skip the problem for now, and just handle the specific case where files are required using a static string and/or add a hack that will use a substitution table or regexp to rewrite the require's (eww...)

Anyway, the point is that this is a big hurdle for anyone hoping to write an ahead of time compiler for Ruby without resorting to JIT compiling most of the program anyway. Part of the appeal of ahead of time compilation is to avoid JIT compilation (and avoid lugging around a ton of source files), so while supporting JIT compilation for pathological cases is good and/or necessary depending on how you look at it, for an ahead of time compiler to make sense you want to make that a "last resort" if there is no sensible alternative.

Problem #2: A very costly method dispatch


Let me count the number of ways Ruby makes method dispatch expensive in a naive implementation (see also my post on the Ruby Object Model as implemented in MRI)

  1. A Ruby method call involves first identifying the class of an object. This either requires following the "class" pointer of an object, OR a "decoding scheme" (as used in MRI) to allow small objects like Fixnum, True, False, Symbol and Nil to be encoded without a pointer.
  2. Then we must follow the "super" chain from class to class, potentially all the  way to Object, to determine which class can satisfy the method call. Then if that fails, it needs to do the same for #method_missing.
  3. Because the type of the object stored in any variable is unknown, a compiler can not assume anything about the type of an object based on where it is stored. Unlike, say, C++, where the compiler will happily assume that a Foo * will hold a pointer to an object of class Foo or a subclass, and treat it accordingly, a Ruby compiler could not. This also largely affect inlining of methods as a viable optimization.
  4. .. and anyway, since Ruby classes are open, users can add, alias and remove methods at will (with some minor restrictions), so a Ruby compiler largely can't assume a method stays the same through the lifetime of the object.
  5. ... even worse, thanks to meta-classes in Ruby, there may conceptually be a new "almost" superclass inserted for an object.

Luckily, there are a number of solutions that can vastly improve on the naive implementation of Ruby method dispatch. Unfortunately most (though not all) of them make the compiler and runtime more complex.

Problem #3: Meta-programming

It's alluded to above. Ruby allows extensive modification of classes and even objects at runtime. Defining new methods, inserting new modules or otherwise messing with the structure makes static analysis to optimize other problematic aspects of compiling Ruby very hard. 

Take even something simple like integer arithmetic. Never mind that Ruby automatically handles overflow and turns Fixnum's into Bignum's, which means that even if you do try to make it cheaper than a method call, you first have to check whether you deal with a Fixnum (which is not an ordinary object) or a Bignum (which is an ordinary object), but if both values ARE Fixnum's you still face the uncertainty of whether or not a method of Fixnum has been replaced by unscrupulous monkey patchers...

Problem #4: No statically defined set of instance variables

In a language like C++, object size is kept down because the compiler knows at compile time what instance variables exist for any given object. As a result, it can pack them tightly together. In Ruby, theoretically an object can have new instance variables show up at any time, through mechanisms such as #instance_variable_set, #instance_eval etc.. MRI solved this by making the instance variables stored in a hash table, which is incredibly wasteful for otherwise small objects. Ruby 1.9 has reduced this impact somewhat by storing up to 3 instance variables in the object itself (see the Ruby1.9 section), and then falling back to an external structure.

Luckily, in practice most objects will have a small and mostly statically determinable instance variable set, and some of the same methods that can speed up method dispatch can be used to handle this more effectively as well.


Problem #5: Method visibility, default arguments and method arity


In C++ for example, you can easily know at compile time if a method is private or protected. In Ruby, since you don't know the class of the object you will be passed, you can't know that until runtime. This means this needs to be checked at runtime as well. We could imagine checking this only in private or protected methods, since use of them is relatively rare in Ruby code, but that's not easy either: 

Private methods in Ruby require self as an explicit receiver, which means they are only allowed to be called from within a method, and on the same object as the method is operating on. Slight little problem... How will the executed method know that this is the case? And then there's the ways of bypassing the check. An alternative for handling this is to pass along a flag saying "I promise I was called with self., honest!" (or used #__send__ etc. to bypass the check), but handling that without imposing overhead on calls that don't need it is non-trivial as well.

Default arguments, and more generally method arity, suffers from the same problem. In C++ the caller can take responsibility for initializing default arguments when not providing values for all of the arguments. In Ruby, the compiler won't know when you are calling a method that has default arguments vs. one that just have fewer arguments (for that matter, you don't know for sure if the number of arguments is right at all), and so this has to be handled by the callee. That means the callee needs to get an argument count, or have another method of determining how many arguments were passed. That's not to bad. 

But the callee then also have to contain the logic to initialize missing arguments. That's messier as it means either manipulating the stack frame, handling multiple ways of accessing an argument, making the arguments a "real" Ruby array and push the default arguments onto it, or otherwise moving arguments around to get them in a consistent location. There are simple ways of doing this (shove it all into a "real" Ruby Array object for example and convert the default argument initializers into the equivalent of ||= calls), and there are faster ways (keep things on the stack as much as possible, possible mess with the stack frame, and only convert to a Ruby Array object for methods where it's actually treated as one).

Solutions


Below are some of my thoughts on how to address the bigger ones of these problems.

Solution #1: Speeding up method dispatch with a multilevel approach to dispatch

In "Protocol Extension: A Technique for Structuring Large Extensible Software Systems" (1994; original Postscript file; PDF from Citeseer X), Michael Franz, presents a method for "protocol extension" in Oberon - a way of relatively effectively adding methods at runtime. The basic idea is that method changes (addition, overriding or removal) are rare, and method dispatch is frequent, so it makes sense to do more work when modifying the methods than it does when calling them. 

Franz suggested that the "vtable" for each subclass is made to contain pointers to all the methods for the entire hierarchy. A method that is overridden in a subclass has its method pointer overwritten in that subclass and all descendants of that subclass that has not overridden it themselves.

The problem he noted is that in a big class hierarchy the vtables for each class may grow prohibitively large, while remaining mostly unchanged. This can be counteracted in a number of ways - one of them being splitting the vtable into chunks for "interfaces", and making each vtable a set of pointers to vtables for interfaces.

Another way is this:

The compiler (or compiler-writer) can make educated guesses about which methods are most likely to exist in all classes, and which methods are less likely to get called:

  • Methods in classes high up in the hierarchy are more likely to remain present. In Ruby methods on Object will exist in almost all classes (the only exception will be cases where the methods have been explicitly removed or aliased away).
  • Methods that are referenced in inner loops may be more important to make fast to call than others.
  • Analysis of number of call-sites can also give an indication of how frequent a method will be called.
Methods that are present in more than a certain threshold of classes, or that are judged to be particularly performance sensitive can be allocated an offset in a per-class vtable. Methods that are slightly less likely can be grouped into "interfaces", and require a one level indirection. As a last resort the implementation can be forced to fall back on a #send call that does lookups the way MRI does now. But see solution #2

Solution #2: Polymorphic Inline Caches

Otherwise expensive method lookups can be cached. This caching can even be done "inline" in he code path by dynamically inlining the code for the classes that are seen at a call site in practice. Polymorphic inline caches were introduced in Optimizing Dynamically-Typed Object-Oriented Programming Languages with Polymorphic Inline Caches" by Urs HlzleCraig Chambers, and David Ungar.


This does not remove the problem of potentially expensive lookups, but it drastically reduces the impact in cases where the type used is relatively stable (which in practice is the common case - the same variable is rarely assigned more than a handful different types).

The key to PIC for languages like Ruby is that you still need to check the type, and you either need to invalidate the old type when a class is modified, or you need to invalidate the caches. Maintaining that logic can be complicated (compare to the approach in solution #1, where all that is required is updating the vtables). PIC's can be combined with the approach above: Solution #1 provides cheap lookups, but #2 can still allow inlining of whole methods where appropriate, in a way that is safe.

Solution #3: Trace Trees


Dr. Michael Franz and Andreas Gal have been working on a technique called Trace Trees. Possibly the best introduction is in a blog post by Andreas Gal, but the papers are also fairly accessible.

Trace trees is the "hot new thing" - you'll find it used in the new JS JIT for Mozilla for example ("Tracemonkey"). The short description is that trace trees consists of tracking the execution of bits of an application - typically in a bytecode interpreter, and when a certain part is executed often enough, you "trace" the code execution  and create a "tree" of code fragments. You use this to identify loops or frequently executed code paths that are optimized (in the bytecode interpreter case, this would involve JIT compilation), and protected by a "guard" that verify whether to keep executing the new native generated code path.

While the approach is intended  for a bytecode interpreter, it has scope for being used to handle dynamic runtime optimization and inlining of specific code paths generated by an "ahead of time" compiler as well. The compiler can generate the best code it can, but inject timing and tracing code where appropriate to allow it to use information gathered at runtime to inline specific method calls into inner loops etc.. An in-between alternative is to let the AOT compiler use profiling data gathered from past runs to built trees on subsequent compiles (this approach is also suggested in the papers on polymorphic inline caches)


Solution #4: Dynamic object packing (for instance variables)

We've already explored the building blocks for handling the troublesome issue of dynamic instance variables above. There are two parts to that problem: Quick access, which is hindered by having to resort to schemes that requires expensive instance variable lookups, and space which is hindered by a potentially dynamically changing set of instance variables.

First of all it is possible to apply a similar analysis to that suggested for vtables: Identify the most likely set of instance variables for objects of a class. Assign specific offsets for those instance variables, and compile code using static offsets for code internal to the class.

For instance variables that are not guaranteed to be ever used, there's a value judgement: If information is available to determine a rough likelihood you can use a cutoff to decide which to always include in the object. This has the potential for huge space overhead if the guess is wrong. The alternative is to fall back on a hash table referenced from the object to handle additional instance variables. This is costly in space and time if the number of objects with extra instance variables is high.

However the latter method can be combined with the earlier solutions to reduce the time overhead.

To reduce space usage further, we can dynamically pack the object: 

We can potentially cooperate with the garbage collector to identify pointers to an object, and then reallocate the object elsewhere and change the layout, and then use tracing similar to the one mentioned above to created optimized code paths that rely on the new static layout. If we don't want to move all objects, we can handle this by inserting "proxy classes" and making specific subsets of these objects instances of specific proxy classes. In fact, Ruby already sort-of does this by inserting proxies for Module's that are "hidden" for normal Ruby code when walking the inheritance chain.

This is a fairly complex solution, but one that can potentially allow very tight packing of objects, and avoiding a significant percentage of hash table lookups for instance variables. Combined with trace trees and PIC's, a significant part of the code overhead for method accessors used from outside the class can also be removed.






Comments - Newest first

2009-06-07 05:31 UTC
(A better reference for the Self papers is the Self website, which also has a Linux version if you want to try some hands on experience)

2009-05-13 12:24 UTC
Vidar Hokstad
Stephen,

Sort of... I agree with you that there are lots of cases where apps are really quite static and you can assume that an eval() is an oversight/error. A compiler switch to do that would be helpful in some instances.

"require" in Ruby is a little bit trickier, though, since it's not restricted to happening unconditionally, and even things like mspec is full of File.expand_path() etc., which is fine if you can guarantee that File.expand_path() hasn't been overridden to do something else.

For that matter, you don't know what "require" does without analysis, and in a fairly significant part of Ruby code "require" is *not* the built in "require" but rather the Rubygems one that does a lot of additional work.

Or you have funny little things like http_require. Yes, it loads code via http. Yes, it replaces the built in "require".

A compiler for (unchanged) Ruby either needs to ignore redefinitions of require, in which case it will be unable to find the code in many cases (I'd be fine with breaking something like http_require, but not Rubygems), it needs to emulate/reimplement the most common behavior (support the Rubygems behavior directly at compile time for example), or it needs to partially evaluate the program at compile time, or it needs to actually load/JIT the code.

My compiler project will hopefully eventually handle some of the simpler cases automatically, but others may need compiler switches to resolve.

Adding a JIT backend to my compiler actually shouldn't be too hard - it'd be possible in the current form just by replacing the Emitter class that spits out assembly - so once the compiler is closer to working form there's nothing really preventing linking the compiler into the apps to fully support eval and dynamic require's, but if you're first going to do an ahead of time compiler, you lose one of the biggest benefits if you don't do as much as possible of the work ahead of time and also allow taking advantage of it to reduce dependencies.

One way, though it'd require a small change to programs' would be to add some "pragma"'s [comments that affect compiler settings] to allow marking the end of the 'read' phase, and have the compiler treat everything prior to that as 'stuff to do at compile time if possible'.

2009-05-13 02:22 UTC
On #1 : really that's the "eval" problem, yes? Ruby "require" acts like eval(open(PATH).read)

It think if someone is compiling their code, one can (documentadly!) assume they don't want any interpreter/JIT hanging around, and read and compile at compile time... and throw errors on eval calls!

2009-05-03 19:38 UTC
Vidar Hokstad
Charles,

Interesting comments. Thanks.

The JVM, at least in the form of Hotspot, employs a monomorphic inline cache. Polymorphic caches are only a benefit if the overhead they introduce for managing multiple cache entries is outweighed by the benefit they provide. But in most modern OO systems, calls generally end up either monomorphic or megamorphic, for which PICs either introduce overhead or are unhelpful.

Unless you can *prove* that the call site is monomorphic you still need fallback code. That fallback code will never get called if the call site is genuinely monomorphic. If it is megamorphic, you'll hit limits and stop augmenting the cache pretty quickly.

The overhead of a polymorphic inline cache versus a monomorphic one is minimal. In either case you need a way of allocating the space for the cache, insert a type guard and inline the call. The only difference between a polymorphic and monomorphic cache in that case is whether you chain type checks on failure or go straight to the "worst case" fallback that needs a full lookup.

Anyway, in most of the Ruby code I work with, most code paths are polymorphic with a relatively low number of types. A typical example is variables that will either hold nil or an object of a specific type which tends to occur a lot, or the compiler I'm working on where the AST nodes hold either Array, String or Symbol, or code that contains Fixnum or Bigint. I can't think of any Ruby programs I've personally written that contains more half a dozen types or so in a single variable.

But I think it's important to realize that the JVM is a lot more than a nice GC and runtime environment...it's an optimizing dynamic language platform.

Sure it is, but it's also an "uninteresting target" for a lot of people. To me the JVM presents too much of a semantic mismatch to the languages I'm interested in, and I just plain don't like having to drag a JVM around, and I believe it's possible to do much better without all that much work - I guess we'll know in a while whether or not I'm right. Competition is good, though - JRuby certainly has the potential to present a lot more of a challenge than MRI.

2009-05-02 22:19 UTC
There seems to be a general lack of discussion about techniques the JVM employs, which would certainly be important here.

The JVM, at least in the form of Hotspot, employs a monomorphic inline cache. Polymorphic caches are only a benefit if the overhead they introduce for managing multiple cache entries is outweighed by the benefit they provide. But in most modern OO systems, calls generally end up either monomorphic or megamorphic, for which PICs either introduce overhead or are unhelpful.

Hotspot uses vtable dispatch for polymorphic calls, but can also perform profiled inlining of monomorphic calls across many levels. As a result these calls can be optimized as a single unit. Hotspot also is able to optimistically perform this inlining until situations change, at which point it can dynamically deoptimize even to the point of branching back into the interpreter...and then reoptimize later.

For instance variables in JRuby, we do a form of object packing by assigning instance variables a numeric index rather than performing a hash hit every time. The instance variable store then becomes a simple array attached to the object. Accessor logic for retrieving instance variables is call-site cached like any dynamic call, and for "monomorphic" instance variables the entire logic of retrieving a specific instance var will be inlined and optimized into a couple memory accesses.

There's an important difference, however, when targeting Hotspot: it continues to reoptimize the application as it runs and optimizes across *all* code. With only a primitive compiler, we've been able to achieve better performance than any of the production-ready Ruby implementations and comparable to several of the newer "experimental" implementations. But more importantly, we have been able to remain mostly compatible with Ruby 1.8 while posting substantial performance improvements for real applications like Rails. Getting microbenchmarks to run fast is easy; getting an entire system to run faster, from execution through to core classes, is a lot harder, and the JVM's on-line re-optimization makes it a lot easier for us.

We also have plans to start "helping" the JVM a bit more. Where currently our only major optimization is to lift local variables to registers, we will soon begin work on an optimizing compiler that can propagate type information, inline core class operations, eliminate localized numeric boxing, and a lot more. Those are standard techniques applied by non-optimizing compilers already under way...but then we pass that optimized bytecode on to the JVM for further inlining and optimization. We have lots of room to improve over the current limited, naive optimizations.

I'm sure the performance battles will continue to rage on, and I have no doubt that a production-quality Ruby implementation will emerge that can meet or exceed our performance for some applications. But I think it's important to realize that the JVM is a lot more than a nice GC and runtime environment...it's an optimizing dynamic language platform.

2009-04-26 20:12 UTC
Vidar Hokstad
Damien,

Seems like we mostly agree.

I'm not sure "simple inference" is all that is needed to regain the missing information, though: it's much easier to apply invalid optimizations than to regain performance. :)

For the general case, I agree. I believe though that a large class of apps fall in a category where they largely avoid constructs that are hard to optimize. For example, looking at my own compiler, a lot of the key objects are instantiated, assigned to an instance variable in a constructor of another object, and never modified. A large percentage of the method calls happen on those objects.

When this knowledge is combined with the knowledge that this app never calls eval(), nor define_method() or any of the other meta-programming constructs, and never tries to require/load code at runtime, then it's safe to replace all dynamic lookups on those objects with static calls, for example. I believe there's a fairly significant number of cases like that, where determining the absense of "dangerous constructs" can trigger a large set of additional optimizations (the same optimizations could still be applied in the presence of eval and friends, but you'd need to trap them and be prepared to trigger fallbacks if they change any of the objects or classes you've done this to). The same applies to access to instance variables and inlining accessors.

That's just scratching the surface.

Experience with the latter, however, led me to believe that despite its elaborate type (and interval) inference algorithms, the compiler tends to require user assistance to generate optimal code: type annotations, whole-program optimization and a judicious choice of data structures really make a difference.

Sure.. But I think this is something "casual programmers" of a language like Ruby don't really need, and more advanced programmers will be much more likely to consider to some extent when they have a choice of language implementations where it makes more difference. Today "fast" Ruby is still dog slow, and so you work around it - whether by throwing more servers at it or writing C-extensions.

But we're seeing a broad range of efforts to make Ruby fast, and when it's fast enough that learning to make the right design choices can save people from a lot of scaling pain, I think the people for whom performance matters *will* care. The rest will be happy about the speed increases they get for "free" and not worry too much about the extra benefits they could get, but that's ok.

Personally part of my motivation is that I increasingly don't want to have to deal with languages like C/C++. I wrote the code for my masters thesis in a mix of Ruby and C/C++, and had to rewrite chunks in C/C++ for performance and it was just aggravating when I knew that most of what I was doing was things that a proper optimizing compiler would have been able to make almost as fast.

And then, fixnum + fixnum can still overflow.

True, but personally I think that for most users the overhead can be made acceptable (the worst case cost for code that rarely overflow is an extra branch to fallback code that will rarely if ever take the code onto a fallback path), and for those few where it doesn't, a compiler could easily offer a pragma or compile time switch "I really know what I'm doing and there's no overflows here". I like options as long as the defaults are sane and they don't add too much complexity (in this case: omit the fallback path and any guards). I think that can be done for a lot of other optimizations as well.

I do agree with the comment you made about making the "typical case" fast too, but if I have the option of dipping down into C vs. turning some knobs on a compiler or making more sophisticated choices with my Ruby in order to meet my performance requirements, I know which I'd pick. Unfortunately today I often still don't really have the choice.

Vidar

(re: posting parts of this on your blog - go ahead)

2009-04-24 13:27 UTC
Vidar,

I'm not sure we disagree that much!

I meant that compilers add a lot of complexity compared to an interpreted language implementation. You are of course right that generating code in memory is not fundamentally different from building executable files, except for various compilation speed/code quality trade-offs one might want to apply.

I also agree that most applications are mostly static in practice, or the inline caching approach would not be very effective anyway. I'm not sure "simple inference" is all that is needed to regain the missing information, though: it's much easier to apply invalid optimizations than to regain performance. :)

Quite a few people cut their teeth on the problem: in addition to Shed Skin, which I mentioned in my previous post, compilers such as Stalin (http://cobweb.ecn.purdue.edu/~qobi/software.html), CMUCL and SBCL (http://www.sbcl.org) have been applying type inference to eliminate unused dynamism, sometimes with outstanding results.

Experience with the latter, however, led me to believe that despite its elaborate type (and interval) inference algorithms, the compiler tends to require user assistance to generate optimal code: type annotations, whole-program optimization and a judicious choice of data structures really make a difference. And then, fixnum + fixnum can still overflow.

The problem is that requiring the user to cooperate with the implementation is effectively changing the language. While our respective track records show that we are willing (and interested!) to investigate these issues, I'm not sure dynamic language ecosystems would be as vibrant as they are if their members felt guilty about using hash tables or (dynamically constructed) regular expressions all over the place--which is why I'm looking into speeding up these existing "patterns."

Cheers, -D

P.-S. -- Do you mind if I repost parts of this discussion on our blog? (With proper attribution and linking, of course.)

2009-04-23 00:09 UTC
Vidar Hokstad
Damien,

I disagree with you with respect to static compilers. They don't add any more complexity than a JIT/VM environment *worst case*, as the "naive approach" would be to just generate calls to the API of a JIT or combine bytecode with a JIT.

For a language like Ruby, the JIT functionality needs to be there anyway to handle things like eval(), and when you have either the "static" functionality or the JIT functionality, the other is reasonably trivial as long as you keep it in mind.

What you do get is an environment where you can apply more expensive optimizations in advance, and you also get the ability to reduce dependencies. I like having apps where I can, for example, just drop in a single binary and have a working system.

To me, for many apps the dynamic features and JIT will never be invoked. My experimental compiler itself, for example, is so far pretty static - it uses #send a couple of places, but I might revert that if it ends up having a big performance impact (premature optimization and all that), and most larger Ruby apps I've worked on make little use of the dynamism apart from on the surface. In fact, for a lot of them, a lot of static type information can be recovered through relatively simple type inference, and utilized by a compiler.

For others it will be essential, but not get any more complex than if I'd done the parsing and JIT'ing at runtime - the difference boils down to spitting out code to memory vs. spitting out code and relocation tables to ELF hunks (for Linux) or similar. Object files are trivial to generate on most platforms.

As for your notes on performance, I agree - Rubyists will need to learn about what constructs are *slow* as well, as the VM's and compilers won't be able to fix it all. I'm guilty of a lot of sins there myself too, such as using strings where I could use Symbol's etc..

2009-04-22 23:37 UTC
Hi Vidar,

Nice article, and good summary of the challenges. Static compilers are indeed not terribly useful for dynamic languages: they add a lot of complexity to the implementation, but not not necessarily yield a lot of benefits except on toy or very static programs.

(Speaking of efficiently compiling static programs, you might want to check out Shed Skin, an optimizing compiler for Python--gasp!--available at http://shed-skin.blogspot.com. Mark Dufour is seeing impressive results on selected code fragments.)

I believe most VMs go with a mix of solutions #2 and #3 above, the Mozilla folks being outliers with their "tracing" approach. While the latter is very interesting, its effectiveness remains to be proven; Lars Bak, for one, does not think it is a silver bullet (cf. http://tr.im/jj4L, or http://tr.im/jj4T for a WMV).

Our Ruby JIT/VM implementation (http://crosstwine.com/linker/ruby.html) employs a variant of inline caching with a few additional twists to take care of problems #1, #2, #3 and #5--which mostly boil down to one same thing at the implementation level--and solves #4 with a relatively straightforward per-class slot mapping (not included in the "alpha 1" release, but otherwise fully functional).

While it goes a long way to help, that infrastructure is not, however, sufficient to achieve "C speed" in most real-world scenarios. Excessive consing (memory allocation, especially for small boxed objects), the use of hash-tables as the ueber datastructure, and extreme reliance on regular expressions are but a few of the challenges that await the intrepid VM enthousiast!

Cheers, Damien

2009-04-22 19:11 UTC
Bob Lauer
The pypy guys have run into similar issues. Their approach (if I get it correctly) is to a) take a very abstract graph-based approach to the language, and b) define a subset of the language (pypy is written in python), and make that translatable to the defined backends (in the case of pypy: native code as well as various virtual machines, e.g. JVM, CLR).

cf. http://morepypy.blogspot.com/

2009-04-22 17:36 UTC
Vidar Hokstad
Eric, thanks for the pointers to the videos. I rarely get the time to watch stuff like that, but I'll try to fit it in sometime.

Isaac, "Lots" of compilers use PIC's. I guess it's probably the oldest of the techniques described above, given that the original paper was published in '91. It has lots of appeal because it's fairly simple to implement and it doesn't require any fancy analysis.

2009-04-22 17:14 UTC
Isaac Gouy
'Polymorphic inline caches"

fyi The VisualWorks Smalltalk implementation has used PICs since 1999.

http://www.cincomsmalltalk.com/CincomSmalltalkWiki/VisualWorks+5i.3+White+Paper

2009-04-22 16:16 UTC
Eric
Many of those ideas were discussed at RubyConf 2008. Have you seen the videos at http://rubyconf2008.confreaks.com/ ?

I'd recommend the following three:

* Rubinius by Evan Phoenix * Ruby Persistence in MagLev by Bob Walker, Allan Ottis * How Ruby Can Be Fast by Glenn Vanderburg

Perhaps there are others as well (Future of RubyVM by SASADA Koichi, JRuby: What, Why, How...Try It Now by Charles Nutter, Tom Enebo, and IronRuby by John Lam)

2009-04-21 21:34 UTC
Vidar Hokstad
Eric,

You're absolutely right, and the reference to polymorphic inline caches in particular is referring to one of the self papers.

2009-04-21 20:11 UTC
It's worth mentioning that most of these challenges are not specific to Ruby. In particular, the Self language presents most of the same problems, and it has had efficient compilers since the 90s. (Benchmarks at the time reached roughly 50% of the speed of C code, which would be amazing for Ruby.)

See here and here for more information.

It looks like trace trees vaguely resemble a generalization of Self's "optimistic compilation."

Post a Comment

Basic HTML allowed. Javascript required for anti-spam check (I am testing a new anti-spam measure. Problems commenting? Please e-mail me: vidar@hokstad.com)

About me

E-mail: vidar@hokstad.com Skype: vhokstad
Twitter: vhokstad
View my LinkedIn profile.

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married and we just had our first child, Tristan Ikemefuna Hokstad.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Twitter Updates

    follow me on Twitter