Vidar Hokstad V2.0

Home Blog

Tag: compiler

2008-07-10 19:56 UTC Writing a compiler in Ruby bottom up - step 10

This is the tenth in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

Uh, yeah. So much for posting the next part in a few days. I think I'll stop trying to second guess when I'll next have time (but sine I'm going off to Norway for vacation for a week, it's a safe bet the next part won't show up until later than that).

For those who care (anyone? anyone at all? didn't think so), I've been very busy lately, both with a big new project at work that'll see us relaunching the websites for three large restaurant chains in the UK. The exciting part of it isn't the code - that's pretty straight forward - but the infrastructure we're building, which is giving me a great opportunity to set up a very resilient OpenVz based setup with every single component virtualized.

My spare time has all been taken up by other projects which I might say more about some other time...

Anyone, back to the subject at hand...

What can we do so far?

Each step so far has included minor bits and pieces to test specific features. But it's hard to get an idea for how complete a language is becoming without writing something a little bit more substantial. In this case I've chosen to write a "text-mangler" that will "parse" an s-expression like (Lisp like) syntax and turn it into a Ruby script that will run the compiler with the equivalent program.

It will not be a proper parser, but merely a very simple case of rewriting pieces of text to turn something like this:

(foo "bar" 1 2 3)

into something like this:

[:foo, "bar", 1,2,3]

Simple enough to do, but it'll test quite a few things.

Preliminaries

First of all I'll separate the compiler from the "driver" - the little snippet of code containing the actual program, and instantiating and running the compiler. Essentially the entire Compiler class gets moved into a separate file. Here's a really stripped down starting point that illustrates how to use it, and how we'll start producing the "parser":

require 'compiler'

prog = [:do, [:puts, "require 'compiler'\\n"], [:puts, "prog = [:puts,'Program goes here']"], [:puts, ""], [:puts, "Compiler.new.compile(prog)"] ]

Compiler.new.compile(prog)

Notice that the program included here reproduces another program with the same structure. If run, it'll output another driver program containing a basic program itself.

Our goal is to expand on this program so that it will read a Lisp like program and output something almost exactly like above. The program above, in fact, should be exactly what we'd expect to get if "parsing"

   (puts "Program goes here")

The rules

What exactly will we "parse"?

We'll turn

  • [ ... ] into ( .... )
  • foo into :foo
  • 123 into 123
  • a, b into :a :b
  • "foo" into "foo"

There are lots of copouts here:

  • We won't tokenize properly, so lots of invalid input will create even more broken output
  • It won't handle escaped double quotes at all.
  • No error checking.
  • Did I mention no error checking?

This is throwaway code - I'm not intending to write a Lisp compiler - so I'm even less inclined than usual to spend time writing something robust. Consider this a test of the capabilities of the compiler so far, and a way of planning out what to add next. Nothing more.

So lets write the damn thing

First of all: This is an exercise in bootstrapping. First we write the "parser" in our current syntax. Then we'll manually translate it into the lisp like syntax it translate. Then we'll finally use itself to regenerate it's own Ruby syntax. This roundtrip will also serve as a test:

  • Make it parse itself to generate Ruby.
  • Run the Ruby to generate assembler.
  • Assemble the parser.
  • Run the result on itself again.
  • Compare the Ruby code generated in the first iteration with the one generated in the second. If they're not identical something is broken.

This is an IMPORTANT series of steps to keep in mind. Whenever you bootstrap a compiler for a new language or a part of one in the language it compiles so it can process it's own source, you want to make VERY sure that each step can process the next version of itself correctly, and that you archive every single version, since it's very easy to get yourself into a situation where you suddenly have a lot of uncompilable broken code when iterating the syntax and semantics of a new language.

First we spit out a driver as described above:

require 'compiler'

prog = [:do, [:defun, :parse, [:c,:sep], [] ],

[:puts, "require 'compiler'\\n"], [:puts, "prog = [:do,"], [:parse, 0,0], [:puts, "]\\n"], [:puts, "Compiler.new.compile(prog)"] ]

Compiler.new.compile(prog)

The code above defines a "parse" function for future use. It then outputs a prolog, calls the parse function, and outputs a tiny epilog, just like the first part of writing the compiler itself. Next we need to start filling in the parse function:

 [:defun, :parse, [:c,:sep],
    [:while, [:and, [:ne, [:assign, :c, [:getchar]], -1], [:ne, :c, 41]], [:do,
         ...
    ]]
 ]

Hard to read? What about if I rewrite it like this:

defun parse c,sep
  while (c=getchar) != -1 and c != 41
    ...
  end
end

Character 41 (decimal) is incidentally right paranthesis. It's a shortcut to make it trivial to use this function recursively to parse expressions.

So "c" is obviously the current character. What about "sep"? Sep will be use later to indicate whether or not to output "," to separate sub-expressions.

So why are they passed as arguments? Well, there's the first reminder for this step: We lack local variables. They'll be added shortly.

Next piece. Inside the while loop we add this:

     [:if, [:eq,:c, 40], [:do,
            [:printf, "["],
            [:parse,0,0],
            [:printf, "]"],
            [:assign, :sep, 1]
          ], ... (else bit goes here)
     ]

In other words: If we see a '(' (ASCII 40), we output '[', call parse() recursively, output ']' and indicate that a comma separator is needed on the next sub expression.

Then:

      [:if, [:eq, :c, 34], [:do,
              [:putchar,34],
              [:parse_quoted,0],
              [:putchar,34],
              [:assign, :sep, 1]
            ], (... else goes here)

If we see a double quote (ASCII 34), output double-quotes, call a new function "parse_quoted" to deal with it, output double-quotes again, and indicate we need a separator.

And here's the last chunk of "parse":

           [:do,
              [:if, [:and, [:isspace, :c], :sep], [:do,
                  [:printf, ","],
                  [:assign, :sep, 0]
                ]
              ],
              [:if, [:and, [:isalnum, :c], [:not, :sep]], [:do,
                    [:assign, :sep, 1],
                    [:if, [:not, [:isdigit, :c]],[:printf,":"]]                 
                ]
              ],
              [:putchar, :c]
            ]

If "c" is whitespace (calling isspace() from the c library) and "sep" is not 0, we output the separator, and clear the flag. Note that this relies on Ruby ignoring trailing "," in array declarations, so [1,2,3,] works the same way as [1,2,3].

Then, if "c" is alphanumeric (isalnum(c)) and "sep" isn't set, we'll indicate we need a separator and specifically if it's not a digit we'll assume it needs to be turned into a Ruby symbol and output ':', then we'll finally output the character. The reason we check to see that "sep" isn't set is to avoid outputing ":" in front of every letter.

That leaves parse_quotes, which looks like this:

  [:defun, :parse_quoted, [:c],
    [:while, [:and, [:ne, [:assign, :c, [:getchar]], -1], [:ne, :c, 34]], [:do,
        [:putchar, :c]
      ]
    ]
  ],

or:

defun parse_quoted c
  while (c=getchar) != -1 and (c != 34)
      putchar c
  end
end

Put it all together, and this is the result:

require 'compiler'

prog = [:do, [:defun, :parse_quoted, [:c], [:while, [:and, [:ne, [:assign, :c, [:getchar]], -1], [:ne, :c, 34]], [:do, [:putchar, :c] ] ] ], [:defun, :parse, [:c,:sep], [:while, [:and, [:ne, [:assign, :c, [:getchar]], -1], [:ne, :c, 41]], [:do, [:if, [:eq,:c, 40], [:do, [:printf, "["], [:parse,0,0], [:printf, "]"], [:assign, :sep, 1] ], [:if, [:eq, :c, 34], [:do, [:putchar,34], [:parse_quoted,0], [:putchar,34], [:assign, :sep, 1] ], [:do, [:if, [:and, [:isspace, :c], :sep], [:do, [:printf, ","], [:assign, :sep, 0] ] ], [:if, [:and, [:isalnum, :c], [:not, :sep]], [:do, [:assign, :sep, 1], [:if, [:not, [:isdigit, :c]],[:printf,":"]] ] ], [:putchar, :c] ] ] ] ] ] ], [:puts, "require 'compiler'\\n"], [:puts, "prog = [:do,"], [:parse, 0,0], [:puts, "]\\n"], [:puts, "Compiler.new.compile(prog)"] ]

Compiler.new.compile(prog)

Completely unreadable and a nightmare to debug. So lets translate it into our Lisp like syntax and see if it can process itself. Some regexps and manual massage later:

(do
 (defun parse_quoted (c)
   (while (and
           (ne (assign c (getchar)) -1)
           (ne c 34))
     (do (putchar c))
     )
   )

(defun parse (c sep) (while (and (ne (assign c (getchar)) -1) (ne c 41)) (do (if (eq c 40) (do (printf "[") (parse 0 0) (printf "]") (assign sep 1) ) (if (eq c 34) (do (putchar 34) (parse_quoted 0) (putchar 34) (assign sep 1) ) (do (if (and (isspace c) sep) (do (printf ",") (assign sep 0) ) ) (if (and (isalnum c) (not sep)) (do (assign sep 1) (if (not (isdigit c)) (printf ":")) ) ) (putchar c) ) ) ) ) ) )

(puts "require 'compiler'\\n") (puts "prog = [:do,") (parse 0 0) (puts "]\\n\\n") (puts "Compiler.new.compile(prog)") )

Yikes. More parenthesis than in actual Lisp. I wouldn't call it readable (particularly the lack of an "else" keyword or similar makes it hard to see where an else block begins, but it's an improvement. The real goal was to write a slightly larger test, anyway.

Lets test it by running it on "itself", then compiling the result, and run that on itself, and check that they match:

$ make parser
ruby parser.rb >parser.s
as   -o parser.o parser.s
cc    -c -o runtime.o runtime.c
cc   parser.o runtime.o   -o parser
$ ./parser parser2.rb
$ make parser2
ruby parser2.rb >parser2.s
as   -o parser2.o parser2.s
parser2.s: Assembler messages:
parser2.s:238: Warning: unterminated string; newline inserted
parser2.s:239: Warning: unterminated string; newline inserted
cc   parser2.o runtime.o   -o parser2
$ ./parser2 parser3.rb
$ diff -B parser2.rb parser3.rb
$

Nice.

Next time we'll go back to the compiler and do some refactoring, before adding some functionality (such as local variables) that would make the above example quite a bit cleaner. Once that's done it's time for a real parser for a much nicer syntax.

In the meantime the archive of the files for this round can be found here


2008-06-19 21:08 UTC Writing a compiler in Ruby bottom up - step 9

This is the ninth in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

UPDATE: Fixed a bug in compile_while

Ok, so I know it's been far too long. Lots going on at the moment... This is a very short part, but I'll try to get the next part cleaned up in the next few days (teaser at the end...)

Implementing 'while' as a primitive

I like small language cores, and it really appeals to me to implement control structures as methods rather than hardcoding them into the language. As we've seen, 'while' can be implemented fairly easily in terms of a quite basic language. We could streamline it even more by reducing the verbiage needed to pass anonymous functions or by adding a Lisp style macro facility.

There's an added complication: Using lambda's that are not proper closures, which is the case for us so far, means there's no way for the body to access external variables. That makes the while loop pretty useless. The "proper" way of fixing that would be to actually implement proper closures, and I promise we'll get there, but not right now - there are other pieces to get in place first.

So I'm biting the bullet and adding a built in "while" construct for now. It also serves the purpose of showing how it's usually done - it's quite simple really.

As usual we start with a simple C test to see how to do this:

int main() {
  while (foo()) {
    bar();
  }
}

gcc -S, and this is the relevant part:

   jmp .L2
.L3:
    call    bar
.L2:
    call    foo
    testl   %eax, %eax
    jne .L3

Notice how the condition is placed at the end, and we start by jumping to the end? The alternative is to place the conditional check at the start, reversing the test and making it jump to after the end of the loop, and then placing an unconditional jump after the body, like the example below, but that results in one extra branch instruction per iteration of the loop:

.L3:
    call    foo
    testl   %eax, %eax
    je .L2
    call    bar
    jmp .L3
.L2:

This is not a big deal since performance isn't a concern right now, but the distinction here doesn't add any complexity to speak of. So here's what we do:

  def compile_while(scope, cond, body)
    start_while_seq = @seq
    cond_seq = @seq + 1
    @seq += 2
    puts "\\tjmp\\t.L#{cond_seq}"
    puts ".L#{start_while_seq}:"
    compile_exp(scope,body)
    puts ".L#{cond_seq}:"
    var = compile_eval_arg(scope,cond)
    puts "\\ttestl\\t#{var}, #{var}"
    puts "\\tjne\\t.L#{start_while_seq}"
    return [:subexpr]
  end

Then we add it to #compile_exp:

   return compile_while(scope,*exp[1..-1]) if (exp[0] == :while)

And a test:

prog = [:do,
  [:call, [:lambda, [:i],
      [:while,
        :i,
        [:do,
          [:printf, "Countdown: %ld\\n", :i],
          [:assign, :i, [:sub, :i, 1]]
        ]
      ]
    ], [10] ]
  ]

Wrapping it up

Like last time, I've tar'ed up the files for step 9 together with a Makefile here. If you have Ruby, make and gcc installed you should be able to just untar it and run make to get the "step9" test binary:

[vidarh@dev step9]$ make
ruby step9.rb >step9.s
as   -o step9.o step9.s
cc    -c -o runtime.o runtime.c
cc   step9.o runtime.o   -o step9
[vidarh@dev step9]$ ./step9 
Countdown: 10
Countdown: 9
Countdown: 8
Countdown: 7
Countdown: 6
Countdown: 5
Countdown: 4
Countdown: 3
Countdown: 2
Countdown: 1
[vidarh@dev step9]$ 

Next time

Next time we'll start putting together and going through a simple "text mangler" that'll let us "parse" a lisp like syntax and generate a Ruby script that runs the compiler on the equivalent program...

It will turn this (not the complete program):

(defun parse () (let (c sep expr) 
  (assign sep 0)
  (assign expr 0)
  (while (and (ne (assign c (getchar)) -1) (ne c 41))
        (if (eq c 40)
          (do (copy_expr) (assign sep 1))
          (if (eq c 59)
                (parse_comment)
                (if (eq c 34) 
                  (do (parse_quoted c) (assign sep 1))
                  (do
                   (if (and (isspace c) sep) (do (printf ",") (assign sep 0)))
                   (if (and (isalnum c) (not sep)) 
                           (do (assign sep 1) (if (not (isdigit c)) (printf ":")))
                         )
                   (putchar c)
                   )
                  )
                )
          )
        )
  ))

into this plus driver code to run the compiler:

[:defun, :parse, [], [:let, [:c, :sep, :expr], 
  [:assign, :sep, 0],                             
  [:assign, :expr, 0],
  [:while, [:and, [:ne, [:assign, :c, [:getchar]], -1], [:ne, :c, 41]],
        [:if, [:eq, :c, 40],
          [:do, [:copy_expr], [:assign, :sep, 1]],
          [:if, [:eq, :c, 59],
                [:parse_comment],
                [:if, [:eq, :c, 34], 
                  [:do, [:parse_quoted, :c], [:assign, :sep, 1]],
                  [:do,
                   [:if, [:and, [:isspace, :c], :sep], [:do, [:printf, ","], [:assign, :sep, 0]]],
                   [:if, [:and, [:isalnum, :c], [:not, :sep]], 
                           [:do, [:assign, :sep, 1], [:if, [:not, [:isdigit, :c]], [:printf, ":"]]],
                         ],
                   [:putchar, :c],
                   ],
                  ],
                ],
          ],
        ],
  ]],

By the end of it it will process itself and spit out the equivalent parse-tree encoded in Ruby. Note how I've put "parse" in quotes etc.. It hardly deserves to be described as a parser - we'll get to a real parser a few steps further out.

Note that I am not going to go down towards writing a Lisp, but I figured since I've used a Lisp like syntax so far for examples, it would be a good test to see whether we're far enough along to do something remotely useful yet. It did need a couple of helper functions beyond what's present so far to get a decent result, but not much. The "real" parser will be for a much more Ruby-ish syntax.

Read more (1 comment)

2008-06-01 12:07 UTC Writing a compiler in Ruby bottom up - step 8

This is the eight in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

Last time we looked at an improved way of handling loops etc. using anonymous functions. But most of that is relatively limited if there's no way of modifying variables. Sure, you can get away with recursion and not allow mutation or even other side effects at all. It might satisfy some functional programming purists, but it would not satisfy me and it would also require some fairly sophisticated optimizations to get decent performance.

I like the principle of reducing side effects as much as possible, and pushing side effects up, so much so that it's one of the factors I pointed out in my post about reducing coupling through unit tests. But I don't like sacrificing simplicity and/or performance for the sake of a "purer" approach.

So lets introduce assignment, and at the same time add in basic arithmetic and comparisons.

Assignment

We'll cheat and keep what we introduce into the actual compiler as minimal as possible. That means assignments go into the compiler, and arithmetic and comparisons are confined to a new C runtime for simplicity for now. It is temporary, I promise. Just another shortcut.

Why don't we give assignment the same treatment? Well, we could, if we introduced a transparent way of passing the address of a variable as an argument to a function, but it would still mean compiler changes, would be slower and it seems it wouldn't actually bring any benefits.

Another reason for not taking that route is that it opens up a can of worns: If we allow taking the address of stack variables, it gets easy to shoot yourself in the foot by accessing it after the stack frame is dead. We could accept that risk (it's not like this compiler is a bastion of safety at the moment, with mostly no typing and no error checking to speak of), or we can introduce "environments" and guarantee that any variable you take the address of will live on the heap and stay alive until you lose the reference... But that's more complicated than I'd like right now. Environments is one of the approaches to keeping local variables "alive" in the heap that would let us do proper closures instead of mere lowly anonymous functions, though, so we might eventually bite the bullet.

Adding assignment to the compiler is trivial anyway.

Here's the first try:

  def compile_assign scope, left, right 
    source = compile_eval_arg(scope, right) 
    atype, aparam = get_arg(scope,left) 
    raise "Expected a variable on left hand side of assignment" if atype != :arg 
    puts "\tmovl\t#{source},#{PTR_SIZE*(aparam+2)}(%ebp)" 
    return [:subexpr] 
  end 

Notice our first error message... We'll eventually need to bite the bullet and add lots more of those, as well as handling them a bit better than just letting the compiler fail with an exception message and a stack trace, but it's a start.

What this function does is quite straightforward: It compiles the right hand expression, then copies the result into the variable.

Can you spot the problem?

It's not a bug, it's a feature! Right... It only allows assignment to a local variable. Whether that is indeed a bug or a feature is open for discussion - many languages does allow for assignment to function/method arguments as well as local (and sometimes global) arguments. I promise we'll look at this closer later, but for now it's simply not important.

We then update #compile_exp by adding this:

   return compile_assign(scope,*exp[1..-1]) if (exp[0] == :assign) 

And test it with this (the output needs to be linked with the arithmetic functions in the next step, or rather it at least needs :sub):

prog = [:call, [:lambda, [:i],
    [:do,
      [:printf, "Testing sub and assign (before): %ld\\n", :i],
      [:assign, :i, [:sub, :i, 1]],
      [:printf, "Testing sub and assign (after): %ld\\n", :i],
    ]
  ], [10] ]

Arithmetic and comparisons

Here's our new C runtime:

signed int add(signed int a, signed int b)
{
  return a + b;
}

signed int sub(signed int a, signed int b) { return a - b; }

signed int div(signed int a, signed int b) { return a / b; }

signed int mul(signed int a, signed int b) { return a * b; }

signed int ne(signed int a, signed int b) { return a != b; }

signed int eq(signed int a, signed int b) { return a == b; }

signed int not(signed int a) { return !a; }

// Note that our "and" won't shortcircuit, as the // evaluation happens before this function is called. signed int and(signed int a, signed int b) { return a && b; }

signed int gt(signed int a, signed int b) { return a > b; }

signed int lt(signed int a, signed int b) { return a < b; }

Not very exciting is it?

I'm not going to try to make it very exciting, but it's worth considering what would be involved in adding compilation of it, though I won't complete that now. Lets look at a single example, "lt", and show the process, though you should be used to it by now. gcc -S on just the "lt" function gives:

lt:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %eax
    cmpl    12(%ebp), %eax
    setl    %al
    movzbl  %al, %eax
    popl    %ebp
    ret

(Uh-oh, what's that setl / movzbl stuff? This is an optimization - cmpl compares the two operands and set a number of condition codes. "set[something]" sets a value in a single byte register %al based on the condition codes - in this case "l" for less. "movzbl" then takes that byte, extends the value up to 32 bits by filling the remaining bits with 0 and puts it into %eax - don't worry about it)

To add this to the compiler we'd do compile_eval_arg on the left and right arguments. After doing it to the left, we'd need to store the result somewhere temporarily, in case we try to use %eax again. We'd then need to issue the cmpl/setl/movzbl instructions, and we'd be done.

The function could look something like this:

def compile_lt scope,left,right
    l = compile_eval_arg(scope, left)
    puts "\tmovl\t#{l},%ecx"
    r = compile_eval_arg(scope, right)
    puts "\tcmpl\t%ecx, %eax"
    puts "\tsetl\t%al"
    puts "\tmovzbl %al,%eax"
    return [:subexpr] 
end

Note that this is completely untested - I haven't even checked that I got the operands to cmpl in the right order.

More importantly, this illustrates one reason for not jumping into this and spending time on doing the same thing for all of these functions right away: Notice that we needed a temporary register? Notice how that was wasteful, because we could have just put things into %ecx right away? Notice how this is a recipe for a lot of pain, because we might have even more complex situations where it is handy (though not strictly necessary) to have even more temporary registers?

Register allocation is a huge subject in itself. It boils down to figuring out the most efficient way of stuffing a large number of temporary and long lived variables into a very scarce (on the x86 in particular) resource, namely registers. You want to make the most of them, because access to registers is generally far faster than memory accesses, and because many architectures have instructions that requires the use of specific registers for specific purposes.

It gets complicated fast, because while using a register when you can is good, "spilling" - that is the process of storing a variable back to memory and loading another variable it it's place - when you have more variables in use than registers can get even more expensive than accessing things straight from memory in the first place. Figuring out what combination of variables is most efficient to keep in registers or move in and out of registers is something that can make grown men cry, and many simple compilers just ignore this problem and pick a very simple and inefficient approach.

At least reading the wikipedia article linked to above is worthwhile if you want to know more about this, though a few searches should turn up a lot of papers. I don't intend to add any advanced register allocation to this compiler until it's far closer to being functionally complete. But we will need to deal with some of it, but it's something I'd rather defer doing a proper implementation of until more functionality is in place for the simple reason that doing it well is a hell of a lot of work, and doing a half assed job at it will at least make things work, even though performance will suffer.

Wrapping it up

Since there's now a separate file for the C runtime support, I've tar'ed up the files for step8 together with a Makefile here. If you have Ruby, make and gcc installed you should be able to just untar it and run make to get the "step8" test binary:

$ make
ruby step8.rb >step8.s
as   -o step8.o step8.s
cc    -c -o runtime.o runtime.c
cc   step8.o runtime.o   -o step8
$ ./step8 
Testing sub and assign (before): 10
Testing sub and assign (after): 9
$ 


2008-05-16 19:13 UTC Writing a compiler in Ruby bottom up - step 7

This is the seventh in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

I've combined two of the planned parts this time, what was in the list from last time as parts 7 and 8.

Making use of lambda / call

We can implement loops using recursion "manually", but adding lambda's now should make it possible to create a slightly cleaner version by actually defining a "while" function:

   (defun while (cond body) 
      (if (call cond ()) (do (call body ()) (while cond body) ()))
    )

In a way, this isn't so far from Ruby blocks, except that we don't provide the syntactic sugar that allows the blocks to be defined without any extra vocabulary, so in use our "while" function would look like this:

   (while (lambda () (cond)) (lambda () (body)))

Not terrible, and far better than earlier, but still not very clean. We'll do better shortly, but for now lets start with a pre-requisite that prevents the code above from actually working in any meaningful sense of the word:

We finally have to add support for using the arguments passed.

In our Ruby based syntax, the while function using function arguments looks like this:

 [:defun, :while, [:cond, :body],
    [:if, [:apply, :cond, []], [:do,
        [:apply, :body, []],
        [:while, :cond, :body]
      ],
      []
    ]
  ]

As a sidebar, indirectly this also means we're providing a hackish way of providing local variables. "Proper" local variables is largely syntactic sugar again. Most stuff in programming is syntactic sugar:

  (call (lambda (i) (code here)) (0))

The code above defines the local variable (i), visible to the code inside, and initializes it to 0. Not pretty, but as usual we defer pretty until later. As usual it's also highly inefficient, since it depends on creating a brand new function, but we'll deal with that later.

Adding function arguments

But lets move on and actually figure out what changes to make to access the function arguments. These preparations also pave the way for proper local variables and more down the line.

First we'll add a "scope" object, and do some minor refactoring. The "scope" defines which sets of variables are actually visible to use at any time.

class Function
  attr_reader :args,:body
  def initialize args,body
    @args = args
    @body = body
  end
end

class Scope def initialize compiler,func @c = compiler @func = func end

def get_arg a a = a.to_sym @func.args.each_with_index {|arg,i| return [:arg,i] if arg == a } return [:atom,a] end end

The reason we separate Function and Scope here is that I'll later introduce scopes that match other things than functions, such as classes etc.

Part of the refactoring mentioned is to thread the scope objects through the compiler, so that scope changes are transparent (by just passing the new scope down).

We hook Scope#get_arg in into Compiler#get_arg by changing:

   return [:atom, a] if (a.is_a?(Symbol))

into this:

   return scope.get_arg(a) if (a.is_a?(Symbol)) 

#output_functions changes into this, to start a new scope for each function:

 def output_functions
    @global_functions.each do |name,func|
     puts ".globl #{name}"
     puts ".type   #{name}, @function"
     puts "#{name}:"
     puts "\tpushl   %ebp"
     puts "\tmovl    %esp, %ebp"
     compile_exp(Scope.new(self,func,func.body)
     puts "\tleave"
     puts "\tret"
     puts "\t.size   #{name}, .-#{name}"
     puts
    end
  end   

And #compile_defun changes to create a proper Function object instead of just an array of arguments and the body:

  def compile_defun scope,name, args, body
    @global_functions[name] = Function.new(args,body)
    return [:subexpr]
  end 

Then we need to actually support accessing the arguments. Again we resort to "gcc -S" to find out how. This:

void bar(const char * str, unsigned long arg, unsigned long arg2)
{
  printf(str,arg,arg2);
}

turns into:

bar:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    16(%ebp), %eax
        movl    %eax, 8(%esp)
        movl    12(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    8(%ebp), %eax
        movl    %eax, (%esp)
        call    printf
        leave
        ret

As you can see, the arguments are accessed relative to %ebp, which has been loaded with a copy of %esp at the beginning of the function. Why the offset of 8 for the first argument? Well, the return address gets pushed onto the stack, creating an offset of 4, and the the old value of %ebp is pushed on, giving us an offset of 8 to access the arguments. Remember that %esp grows down in memory, which is why we're adding offsets to get past the last entries pushed onto the stack.

That leads to this addition to #compile_eval_arg as the last "if" check:

   if atype == :arg
      puts "\tmovl\t#{PTR_SIZE*(aparam+2)}(%ebp),%eax"
    end

Last but not least we create a Function object for "main" by modifying the call to #compile_exp in #compile_main:

   @main = Function.new([],[])
    compile_exp(Scope.new(self,@main),exp)

Time to test how it works:

prog = [:do,
  [:defun,:myputs,[:foo],[:puts,:foo]],
  [:myputs,"Demonstrating argument passing"],
]

Then:

$ ruby step7.rb >step7.s
$ make step7
cc    step7.s   -o step7
$ ./step7
Demonstrating argument passing

The latest version is here


2008-05-06 16:59 UTC Unholy: Converting Ruby 1.9 bytecode to Python bytecode

Posted in: , , ,
_why is at it again...

I can't quite agree with myself if this is seriously demented or tremendously cool... (Yes, it's mostly only an idea, with only the barest hint of doing anything "useful", but anyway..)


About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Tags

(1) (2) (1) (3) (2) (3) (2) (15) (10) (3) (2) (2) (2) (2) (2) (3) (5) (2) (4) (2) (2) (2) (2) (2) (3) (4) (4) (4) (3) (30) (5) (2) (1) (33) (1) (2) (2) (4) (2) (3) (3) (2) (2) (1) (3) (2) (4) (2) (3) (2)

StumbleUpon My link page

(Links I have stumbled and like)