Writing a compiler in Ruby bottom up - Milestone: It can parse itself... 2009-04-16


Yeah, I know, it's a long time since the last part. That doesn't mean things have stopped, though I've been extremely busy.
As always, if you don't know what I'm talking about, take a look at the series.

To see what I've been up to, take a look at the Github repository

I'm not going to write a full part right now, but rest assured more is coming soon. Apart from being busy, a major reason I've held off is that I've been working towards this milestone:

The parser can now parse all of the *.rb files that make up the compiler itself. 
Note that this does NOT mean that it can compile itself, but it can parse itself into an AST. I'll cover some of the work needed to get there in the next part, and then I'll start working on extending it so it can compile larger and larger parts of itself. I'll break it into chunks as that is a fairly substantial amount of work (it will need a lot of runtime support such as partial implementations of Array, Hash, String, IO etc.)

Now, if you look at the commits, and think about the implications of the above I'm assuming you have a nagging suspicion that I'm moving towards actually writing a Ruby compiler (as in a compiler that compiles Ruby)...

All I'll say is that I am considering it. There are a lot of problems to compiling Ruby, many of which I'm going to address in a separate post, but let me leave you with an example to ponder... How do you handle this admittedly contrived (and untested) code sensibly in a compiler:
foo =  `wget some-url`
foo.each { |line| require line.chomp }

class Bar
   puts  "I'm inside Bar"
end

The point is that Ruby has no real distinction between "load time" and runtime, and so any Ruby compiler has to make a lot of tradeoffs (in other areas as well): Do you satisfy require's at compile time or at runtime? Do you execute any code at compile time (some Ruby code manipulate the load path for example, or do weird things to decide what to require, though likely not as bizarre as the example above). Do you execute code in class definitions or generate code to build the classes at runtime?
Some decisions are likely to hurt compatibility; some are likely to require significant tradeoffs between performance and compatibility.

So while I'm heading in the direction of "sort of" compiling Ruby, I've not yet decided how much importance I'll put on completeness and compatibility with arbitrary Ruby code vs. creating a compiler that does things the way I think makes most sense (if you look at the parser, you'll for example start seeing how beastly the Ruby grammar an be, though I hope to be able to clean up the parser quite a bit again once I get a bit further).

Why compile Ruby at all, subset or not, you might ask?

I have a few reasons:
  • Distributing a single binary is practical at times (this indicates at least giving the option of satisfying require's at compile time)   
  • NOT having to distribute a whole runtime environment of an interpreter/vm and libraries can make things a lot easier.  
  • Startup time - It's hard to beat a static binary that only requires relocation for startup time; I've had external Ruby libraries add seconds to startup time for fairly simple apps because of lots of dependencies...  
  • A single binary is a simple way of guaranteeing that the libraries the app was tested with are the ones that actually get used on the client system.  

That doesn't mean I don't think there's plenty of space for VM's/interpreters - there's a lot to be said for the simplicity and flexibility of it. But a compiler allows a language to go a lot of places depending on a VM is unlikely to.

More importantly, I very strongly believe in the value of self-hosted systems. A compiler Ruby or a substantial subset of Ruby with proper support for interfacing with the host platform also allows Ruby VM's or interpreters to be written in (nearly) Ruby.. When I like a language I don't like having to dip down into another language to improve its implementation. 


That said, I also have this urge to have a platform for experimenting with improvements to Ruby, and some of my focus going forwards will be on making the compiler easy to extend


blog comments powered by Disqus