Writing a compiler in Ruby bottom up - step 1/?? 2008-03-27


I've just posted the second part of this series. See part 2

I've been sitting on this for a long time - the oldest entries in the series I'm about to start posting is stuff I started writing back in early 2005 before I even started version 1 of this blog.

I've always enjoyed compiler technology, and I've written several toy compilers. A while back I started writing a simple compiler in Ruby, and I wrote copious notes along the way exactly with the intention of posting it somewhere. Problem was I was that by the time I was finishing up the steps I have so far, my blog was languishing, and so it's been just gathering dust.

I have about 30 parts mostly written, all the way to a simple but working compiler. It'll take me a while to post them, as at least some of them need some cleanup, and I don't have mjuch time to write, but I might post a couple of parts a week, depending on my schedule.

Without further ado, here's the first entry:

Some background, and the first tiny bit of code

Usually, when writing an experimental compiler I've started top down. In other words, I've taken the traditional approach of designing a parser first, whether manually or using a parser generator. I've then started writing various tree transformation code to turn the AST (Abstract Syntax Tree) into symbol tables, doing error checking, annotating the tree with various information, and then finally code generation. Often after I switched to Linux I've "chickened out" and compiled to C, which is ok if your language match well with C semantics, which it often will given how low level C is.

My first compiler though, was written in M68000 assembler and output M68000 assembler.

I bootstrapped it by allowing inline assembler all the way through, and gradually adding structure:

First I'd add functions and function calls, and start making use of them to structure my compiler. Then I'd add basic arithmetics that worked on assembler registers. Then I'd add local variables etc., but parse the assembly so that the register allocator would stay clear of registers used in the assembler that was interspersed.

Gradually the compiler morphed from M68000 to my hybrid language, and less and less assembler.

I want to bootstrap a compiler again. Though I have no intention of writing it in assembler this time. But I will start at the bottom: With the code generator. And I'll have the following goals:

Sounds like a lot of shortcut and still a lot of pain, doesn't it? But it'll be fun!

So lets start with the tiniest bit of useless code:


    #!/bin/env ruby
    
    class Compiler
    
      def compile(exp)
        # Taken from gcc -S output
        puts <<PROLOG
        .file   "bootstrap.rb"
        .text
    .globl main
        .type   main, @function
    main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
    PROLOG
    
        puts <<EPILOG
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
    EPILOG
    
        puts <<GLOBAL_EPILOG
        .size   main, .-main
    GLOBAL_EPILOG
      end
    end
    
    prog = [:puts,"Hello World"]
    
    Compiler.new.compile(prog)

It does nothing. Really. It's just the result of compiling this with gcc -S:


    int main()
    {
    }

... and breaking it into pieces.

It's a fully working compiler... Sort of.... Unfortunately it compiles any input into a noop, and so it's completely useless. But lets start with baby steps.

Consider this series of articles a "flow of consciousness". I do this for fun. I'm not about to sit down and make a nice clean design upfront. Nor am I going to be shy about changing my mind midstream and rip out code, or put stuff I previously ripped out back in again.

What you see here is my second pass: Each SVN commit I make is accompanied by notes for my posts, but the posts contain a lot of extra explanation. Sometimes that explanation makes me change my mind about things - but I won't make major changes unless I think it makes the explanations clearer or a point I made in the first pass is blatantly wrong. Even then I might leave it in, possibly with a note that I'll correct myself later. Sometimes I've contracted steps that took me a while to do but that are trivial to explain "after the fact".

Comments are very welcome, but keep in mind that while I may take your comments to heart, I have already written a lot of this series - I won't start rewriting large parts of it to take into account suggestions, but I will take notes and keep it in mind for whenever I catch up to what I've written already

Next time I'll make it actually do something with it's input, I promise.


blog comments powered by Disqus