Vidar Hokstad V2.0

Home Blog

2008-04-05 11:31 UTC Writing a compiler in Ruby bottom up - step 3

This is the third in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

I hoped to post this earlier, but I've just been swamped this week - though cleaning up my old material so far have only taken about half an hour per post. Anyway, here is part three, and at the end I've posted a list of roughly what will be covered in the following parts. Roughly because I've tried grouping a few of them together with the intent of posting longer, more meaningful parts (this part is a good example why - it's very short), so I've cut my original thirty or so parts down to 20 (though that's just what I finished writing - that by no means finishes the job).

Chaining expressions with "do"

So far, the second version from last time can execute one single expression, and that's it. Not very useful. So I need to add a way to chain expressions like in the body of a function. As it stands, that's terribly simple. I'll add a keyword - "do" - that will accept an unlimited (meaning, in practice, limited by memory) list of expressions and evaluate them. It'll look like this:

prog = [:do, 
   [:printf,"Hello"], 
   [:printf," "], 
   [:printf,"World\\n"] 
] 

It's actually trivially simple. Right at the beginning of #compile_exp we add this:

    if exp[0] == :do                                                                                                                                                        
      exp[1..-1].each { |e| compile_exp(e) }                                                                                                                                
      return                                                                                                                                                                
    end

Recursion will play a great role here - you're descending a tree structure after all, so the core method to compile an expression will be called on deeper and deeper nodes as well, especially to handle proper sub-expressions, which is our next goal.

Adding sub-expressions, take 1

Here's the test we'd like to make work:

prog = [:printf,"'hello world' takes %ld bytes\\n",[:strlen, "hello world"]], 

Here's the first change. In #get_arg, we add this at the start:

    # Handle strings or subexpressions                                                                                                                                      
    if a.is_a?(Array)
      compile_exp(a)
      return nil # What should we return?                                                                                                                                   
    end

If you try to make the code above compile the test, you'll get an error from gcc, because we expect get_arg to return a sequence number for the string constants and nothing else, but that clearly doesn't work for sub expressions.

Adding sub-expressions, take 2: Return values

So how does GCC handle this. Getting the assembly for this:

int main()
{
  printf("'Hello world' takes %ld bytes\n",foo("Hello world"));
}

... results in this (just the relevant part of main):

    subl    $20, %esp
    movl    $.LC0, (%esp)
    call    foo
    movl    %eax, 4(%esp)
    movl    $.LC1, (%esp)
    call    printf
    addl    $20, %esp

... which shows that it's pretty straightforward. Gcc first calls the sub expression (foo), and then expects the return value of a function to be in the register "%eax", and so that needs to be copied onto the stack instead of the address of a string constant.

First up let's fix #get_arg:

  def get_arg(a)                                                                                                                                                            
    # Handle strings or subexpressions                                                                                                                                      
    if a.is_a?(Array)                                                                                                                                                       
      compile_exp(a)                                                                                                                                                        
      return [:subexpr]                                                                                                                                                     
     end                                                                                                                                                                    

seq = @string_constants[a] return seq if seq seq = @seq @seq += 1 @string_constants[a] = seq return [:strconst,seq] end

The only changes are the "return" expressions, where we indicate what is returned - this will expand considerably later.

The remaining change is pretty much a rewrite of the rest of compile_exp. Instead of just collecting the results of get_arg, we iterate over it and output directly (which is why the stack adjustment's also change, since the "args" array has gone away):

    stack_adjustment = PTR_SIZE + (((exp.length-1+0.5)*PTR_SIZE/(4.0*PTR_SIZE)).round) * (4*PTR_SIZE)

puts "\tsubl\t$#{stack_adjustment}, %esp" if exp[0] != :do

exp[1..-1].each_with_index do |a,i| atype, aparam = get_arg(a) if exp[0] != :do if atype == :strconst param = "$.LC#{aparam}" else param = "%eax" end puts "\tmovl\t#{param},#{i>0 ? i*4 : ""}(%esp)" end end

As you can see the change isn't that complex. We just check the return value from #get_arg and pick a string constant or %eax depending. This part will expand considerably as we add more different types of things that can be returned.

You can find the latest version here

Upcoming parts

These are the almost ready pre-written parts only. I expect once I need to start catching up on new parts the focus will be on a simple parser with the goal of making the compiler self-hosted (i.e. able to compile itself) as quickly as possible.

  • Step 4: Introducing a runtime, and defining functions
  • Step 5: Handle literals other than strings
  • Step 6: If ... then ... else
  • Step 7: Looping constructs
  • Step 8: Anonymous functions (lambda)
  • Step 9: Revisiting loops with anonymous functions, accessing function arguments
  • Step 10: Adding assignment and basic arithmetic
  • Step 11: A cleaner "while" loop
  • Step 12: Testing the language: Writing a simple text mangling program to make input cleaner
  • Step 13: Refactoring the code generation and starting to abstract out the target architecture
  • Step 14: Discussion of various concepts and future direction
  • Step 15: Adding arrays
  • Step 16: Local variables and multiple scopes
  • Step 17: Accessing variable length arguments
  • Step 18: Revisiting the text mangler - cleanups to test new functionality, and run it on itself
  • Step 19: Identifying the constructs needed to self host the compiler
  • Step 20: A start on a proper parser


Comments

2008-04-07 03:36 UTC
Thanks for the list showing what parts are comming. Very helpful! Perhaps when done you should stick them all in a single category or offer them as a single download.
2008-04-08 21:19 UTC
nefigah
I've enjoyed this so far--I've just started getting interested in compilers after a couple years of "run of the mill" programming experience and so I was excited to see this series listed in Zen & the Art of Programming's "This Week in Ruby."

Out of curiosity, as someone who has no prior ASM experience, are there any resources you could recommend to get me a bit more up to speed on that aspect? The examples shown so far are "more or less" understandable, but I'd like to learn more.

2008-04-10 13:19 UTC
Vidar Hokstad
boyter,

If you look in the compiler in Ruby bottom up category, they should all end up there, and it has a separate RSS feed too. I'm sure I'll wrap them all up or combine them whenever I reach a natural "conclusion" too.

nefigah,

I don't really have any specific resources to point you to - I mostly rely on using GCC and looking at the assembler output from C, but then I was helped a lot by knowing 6510 and 68000 assembler before I got my first x86 machine, so I knew most of the background. In fact, when I first got an x86 machine I was so disgusted by x86 assembler compared to 68000 assembler that it was what made me finally switch completely to C...

You could try this: http://en.wikibooks.org/wiki/X86_Assembly With the caveat that I haven't read it myself. There are lots of tutorials online though

One big pitfall: There are two syntaxes for x86 assembler. The main difference being whether the source or destination operand is the first listed. If you're on Linux and intend to use asm with gcc, the syntax used by "gas", the GNU assembler is your best bet - on other platforms I don't know what people prefer.

Post a Comment

Basic HTML allowed.

About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Recent posts to my blog

Categories

StumbleUpon My link page

(Links I have stumbled and like)