Writing a (Ruby) compiler in Ruby bottom up - step 32 2014-02-08


This is part of a series I started in March 2008 - you may want to go back and look at older parts if you're new to this series.

Testing And Debugging Code Generation

Unit testing code generation poses some particular challenges because so many things can go wrong, and they can go wrong in all kinds of horrible ways that makes them extremely nasty to track down,

If you've paid attention, you have seen some code generation tests (in compiler.feature) that will catch cases where the code blatantly crashes or returns completely wrong results. These can be really nasty to figure out.

But there are many far more insidious types of bugs that can lie latent and that often doesn't get triggered for small test programs:

And so on.

So let us make the code generation testing more robust, and use these new facilities to see both if we can catch some bugs, and make it easier to track down the causes. Hopefully we can clean the board and fix some of these issues too.

We will add a number of simple facilities to do more comprehensive testing:

This is by no means a comprehensive set of possible debugging help we can add, but combined it creates a good foundation to let us catch a whole series of bugs with code generation that might otherwise require a lot of manual inspection and writing very complicated tests.

To see what we can get out of these, we will try to make the rest of the compiler.feature tests run, and generally clean things up.

But first, lets make things harder for ourselves based on some notes I've taken about failing tests:

Tests and more tests.

Here's one that fails spectacularly (added as features/inputs/method.rb in 34e92a2)


    class Foo
    
      def initialize bar
        puts bar
      end
    
    end
    
    Foo.new("baz")

And this one (features/inputs/method2.rb in 34e92a2)


    
    class Foo
    
      def initialize
      end
    
      def foo
        puts "foo"
      end
    
      def bar arg
        puts arg
      end
    end
    
    f = Foo.new
    f.foo
    f.bar("bar")

But this one works (features/inputs/method3.rb in 34e92a2):


    class Foo
    
      def initialize
      end
    
      def foo bar,baz
        puts bar,baz
      end
    end
    
    f = Foo.new
    f.foo("bar","baz")

.. hinting at a problem with single argument methods, perhaps?

And this one hints at problems with the splat (*somearg) handling:


    class Foo
      def foo arg1,arg2
        %s(printf "foo: %d\n" numargs)
        %s(printf "%p\n" arg1)
        %s(printf "%p\n" arg2)
      end
    
      def bar *splat
        %s(printf "bar: %d\n" numargs)
        %s(printf "%p\n" (index splat 0))
        %s(printf "%p\n" (index splat 1))
        foo(*splat)
      end
    end
    
    f = Foo.new
    a = "foo"
    b = "bar"
    
    # Should give same pairs of output 3 times
    f.foo(a,b)
    f.bar(a,b)

This will get us started, combined with what is already there (try rake failing to see what's not completing)

Debugging in the face of crashes

For all of these we can get some limited help from gdb. A lot of help if we have the patience of a monk.

A run of method2 shows us some of the reasons for better test tools, but also gives plenty of hints:


    $ gdb /tmp/method2
    GNU gdb 6.8-debian
    Copyright (C) 2008 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "i486-linux-gnu"...
    (gdb) run
    Starting program: /tmp/method2
    foo
    WARNING: __send__ bypassing vtable not yet implemented.
    WARNING:    Called with (nil)
    WARNING:    self = 0x8ae0f68
    WARNING: __send__ bypassing vtable not yet implemented.
    WARNING:    Called with (nil)
    WARNING:    self = 0x8ae0f68
    WARNING: __send__ bypassing vtable not yet implemented.
    WARNING:    Called with (nil)
    WARNING:    self = 0x8ae0f68
    
    Program received signal SIGSEGV, Segmentation fault.
    0x0804c768 in __method_Object_puts () at /home/vidarh/src/compiler-experiments/writing-a-compiler-in-ruby/lib/core/object.rb:46
    46          while i < na
    (gdb)

In this case the real problem is not where the crash happens, but clearly earlier. The "foo" gets printed, so the problem likely arises after that, but well before __method_Object_puts even though that is where the crash happens, as we shouldn't be seeing the __send__ bypassing vtable not yet implemented. warnings.

(Note: If you see something different, that is perfectly normal - stack smashing bugs are notoriously tricky in that the smallest little thing can make the crash occur somewhere entirely different.)

So let's start off by fixing our--trace option and see what that tells us (with the caveat that it will change the code that is output, and so may change the character of the crash)

Trace

Our --trace option will just be a start. There's an endless amount of stuff we can trace. The tricky tradeoff will be to get the right balance to prevent us from spewing far much information. At some point we'll probably want to add more granular tracing functionality. For now, we'll experiment:

Here's a first stab (in b31e22ed8)


      def trace(pos,data)
        return false if !@trace
        
        # A bit ugly, but prevents infinite recursion
        # if we accidentally calls anything "traceworthy":
        @trace = false 
    
        @e.with_stack(2,true) do
            if pos
              pos = pos.short
            end
            sio = StringIO.new
            sio << "#{pos ? pos+":" : "  "} "
    
            if data.is_a?(String)
              sio << data << "\n"
            else
              print_sexp(data,sio,{:prune => true})
            end
            str = sio.string
            ret = compile_eval_arg(nil,str)
            @e.save_to_stack(ret,0)
            @e.movl([:stderr],:eax)
            @e.save_to_stack(:eax,1)
            @e.call("fputs")
          end
          @trace = true
      end

We allocate space for two slots on the stack, outputs the source position if provided, and either a string or an AST fragment serialized as %s() expressions.

Notice the way we go about the call. We use compile_eval_arg only to load the address of the formatted string, and then push it onto the stack, then put the address of the C library's stderr on, and call fputs, but without taking advantage of the compilers built in function call support.

My reason for this is simply to ensure the change of being affected by any weird bugs in other parts of the code generation - I want the generated trace code to be as simple and straight-forward and isolated as possible.

In 42c8dae I added support for the :prune option above, to make the trace output simpler.

In cc98262c4 you can see the added calls to the trace method.

On its own, this does not appear to really help all that much in tracking down the issues raised by the various failing tests.

But here's some example output from compiling features/input/method2.rb with --trace:


    class.rb, @38,5: (sexp (assign (index ob 0) self))
    class.rb, @38,5: (sexp (assign (index ob 0) self))
       (assign (index ob 0) self)
       (index ob 0)
       (callm ob initialize (
        (splat rest)))
       => callm :ob.:initialize
    
    string.rb, @8,3: (let (__env__ __tmp_proc)
      (sexp (assign @buffer 0))
    )
    string.rb, @8,3: (let (__env__ __tmp_proc)
      (sexp (assign @buffer 0))
    )
    string.rb, @16,5: (sexp (assign @buffer 0))
    string.rb, @16,5: (sexp (assign @buffer 0))
       (assign @buffer 0)
       <= callm ob.initialize
    
       <= callm String.new
    
       (callm s __set_raw (str))
       => callm :s.:__set_raw
    
    Segmentation fault

Clearly it can use some more love. But it does make it easier to follow what is actually leading up to the crash, and that might prove useful in finding the exact cause later on.

Adding --stackfence

Stack fencing is conceptually very simple: Before the start of the allocated region for a function call, push one or more items of data, at least one of which ought to be a "magic" value. This can either be a static randomly picked value that is reasonable unlikely to occur by accident (in other words, avoid low integer values), or it can be a value that is randomly chosen for each occurrence. We'll do the latter.

On return, after popping the arguments off the stack, we then want to compare the top of the stack against the magic value, before returning the stack to its original state. If the stack state is invalid, on the other hand, we'll want to output some trace data.

We add the stackfence method in dde1814:


      def stackfence
        if !@stackfence
          return yield
        end
    
        # Note: We don't care if this value is all that random.
        
        val = (rand * (2**32)).floor
        @e.pushl(val)
        r = yield
        @e.cmpl("$#{val}","(%esp)")
        l = @e.get_local
        @e.jz(l)
        trace(nil,"ERROR: Stack fence violation. Expected 0x#{val.to_s(16)}")
        @e.local(l)
        @e.addl(4,:esp)
        r
      end

In c806999 we simply then wrap stackfence do ... end around calls we want to fence.

Somewhat disappointingly, --stackfence doesn't actually help us with the current set of failing tests. At least not directly. But note that due to the nature of many stack trashing bugs, pushing a random value like this onto the stack will often cause crashes earlier (on next function return after the stack trashing), before the fence test is allowed to happen. We can still benefit from that by starting the program under gdb, and requesting a backtrace when the crash happens.

Adding %(saveregs)

So we move on to %(saveregs). We will continue to ignore portability for now (and let you have the fun on watching me clean up that mess a while down the line, as I have a planned part dealing with re-targeting the compiler to another CPU architecture) and just have saveregs dump the registers to a freshly allocated bit of memory with no indication of what they are.

Saveregs can be found in b86ee7c. I do not know if x86 has an instruction to bulk save registers (e.g. the M68k has "MOVEM" which can take a list of registers to copy), nor do I care. As usual the point is not to be efficient, but to get results easily, and quickly.

Note that we don't take care to ensure the registers are unchanged at the end of this.


    class Compiler
    
      # Debug instruction, to save registers 
      # 
      def compile_saveregs(scope)
        # First we push the registers on the stack, to ensure they won't get messed up
        # when we allocate memory.
    
        @e.pushl(:esp)
        @e.pushl(:ebp)
        @e.pushl(:edi)
        @e.pushl(:esi)
        @e.pushl(:edx)
        @e.pushl(:ecx)
        @e.pushl(:ebx)
        @e.pushl(:eax)
    
        # Allocate memory
        @e.pushl(8)
        @e.call("malloc")
        @e.addl(4,:esp)
    
        # We're naughty and assume we get memory:
        @e.popl(:ebx)
        @e.movl(:ebx,"(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"4(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"8(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"12(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"16(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"20(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"24(%eax)")
        @e.popl(:ebx)
        @e.movl(:ebx,"28(%eax)")
    
        [:subexpr]
      end
    end

Now that we have %s(saveregs) we can do this:


    %s(defun printregs (regs) (do
      (printf "eax: %08x, ebx: %08x, ecx: %08x, edx: %08x, esi: %08x, edi: %08x, ebp: %08x, esp: %08x\n"
       (index regs 0)
       (index regs 1)
       (index regs 2)
       (index regs 3)
       (index regs 4)
       (index regs 5)
       (index regs 6)
       (index regs 7))
    ))
    
    def foo
      %s(assign regs (saveregs))
      %s(printregs regs)
    end
    
    %s(assign regs (saveregs))
    %s(printregs regs)
    foo
    %s(assign regs (saveregs))
    %s(printregs regs)

This will dump registers 3 times - once before calling foo, one inside, and one after.

It may be tempting to define a function to save and dump the registers in one go, like this:

%s(defun dumpregs () (printregs (saveregs)))

Or even just do %s(printregs (saveregs))

But this won't work well. The registers will then be saved after we start preparing the stack and registers for the call to printregs, and so the values will not be the same as what we're looking for.

In the hypothetical dumpregs example, it's worse: We've pushed an entirely unrelated stack frame and modified registers just in order to call dumpregs, and then another frame to prepare to call printregs.

Lets take a quick look at the output:


    eax: 0804d47d, ebx: 00000003, ecx: bfb354c8, edx: 0823b184, esi: 080524b0, edi: 08048510, ebp: bfb354d8, esp: bfb354d4
    eax: 0804d47d, ebx: 00000001, ecx: 0823af98, edx: b76ec0f0, esi: 080524b0, edi: 08048510, ebp: bfb354c4, esp: bfb354b0
    eax: 00000077, ebx: 00000009, ecx: 00000000, edx: b76ec0f0, esi: 080524b0, edi: 08048510, ebp: bfb354d8, esp: bfb354d4

There's not that much we will learn from this example. Many of these registers gets trashed regularly. ebp and esp however, won't. In this case we do see that ebp and esp both are unchanged when "outside" foo. That's good - it means the call to foo at the very least returns the stack and local variable frames to what they should be (we index local variables and arguments relative to ebp, remember, as esp frequently changes inside a function).

But lets augment the output. This prints out 32 bytes at (%esp), 4(%esp) etc. and (%ebp), 4(%ebp) etc., giving us a quick way of looking at the arguments and return address of the current function (via ebp) and current state of the stack respectively (via esp).

I first added printregs to lib/core/core.rb as a utility (in 066b8e6), and then augmented it as follows in 20b4abc:


    --- a/lib/core/core.rb
    +++ b/lib/core/core.rb
    @@ -78,4 +78,27 @@ false = 0    # FIXME: Should be an object of FalseClass
        (index regs 5)
        (index regs 6)
        (index regs 7))
    +
    +  (assign sp (index regs 6))
    +  (printf "(ebp): %08x, %08x, %08x, %08x, %08x, %08x, %08x, %08x\n"
    +   (index sp 0)
    +   (index sp 1)
    +   (index sp 2)
    +   (index sp 3)
    +   (index sp 4)
    +   (index sp 5)
    +   (index sp 6)
    +   (index sp 7))
    +
    +  (assign sp (index regs 7))
    +  (printf "(esp): %08x, %08x, %08x, %08x, %08x, %08x, %08x, %08x\n"
    +   (index sp 0)
    +   (index sp 1)
    +   (index sp 2)
    +   (index sp 3)
    +   (index sp 4)
    +   (index sp 5)
    +   (index sp 6)
    +   (index sp 7))
    +
     ))

In practice - take 1

Let's try to make use of this to track down our regressions. Lets start with features/inputs/method2.rb

First, let's rewrite bar like this:


      def bar arg
          %s(assign r (saveregs))
          %s(printregs r)
          puts arg
      end

I get a bunch of nonsense, and then the register/stack output:


    foo
    WARNING: __send__ bypassing vtable not yet implemented.
    WARNING:    Called with (nil)
    WARNING:    self = 0x8665f68
    WARNING: __send__ bypassing vtable not yet implemented.
    WARNING:    Called with (nil)
    WARNING:    self = 0x8665f68
    WARNING: __send__ bypassing vtable not yet implemented.
    WARNING:    Called with (nil)
    WARNING:    self = 0x8665f68
    eax: 0804e234, ebx: 00000004, ecx: 0866ce28, edx: b76f70f0, esi: 080503d0, edi: 080484c0, ebp: bfc51aa8, esp: bfc51a94
    (ebp): bfc51ac8, 0804a917, 0866d600, 00000000, 00000000, 00000000, 00000000, bfc51ae0
    (esp): 0804a902, 08665f68, 0866d6a0, 0866d6c0, 00000000, bfc51ac8, 0804a917, 0866d600
    Segmentation fault

How do we interpret this? Well. The first curious thing is ebx. Why does it contain "4"? We use ebx to contain the number of arguments passed, and while we need to account for self and an optional block argument, that does not add up to 4 together with arg.

Also note the (ebp) line, which shows what was on the stack right after the start of the method. The top entry is the old value of %ebp. The next one is the return address. Then comes?

We should expect the arguments. But there's just one non-null value, and we expect more. This one does look quite nasty. Nasty enough to look at the asm output again for a change (it is ugly, and in dire need of some love - we'll definitively set aside a part for that soon).

I've added comments to this to explain rather than break up the code segment:


        .stabn  68,0,24,.LM229
    .LM229:
        # callm :f.:bar
        subl    $20, %esp
        movl    $4, %ebx    # This is the offending number of arguments
        pushl   %ebx        # It's being pushed, so that in absence of proper
                            # proper register allocation, we're pre-emptively
                            # making sure it's not trashed. See (1) below
        movl    f, %eax     
        movl    %eax, 4(%esp) # "f" will be "self" inside the method call
        movl    $0, 8(%esp)   # We don't have a block to pass
        # callm :self.:sexp   # Uhm. WTF?!? (2)
        subl    $20, %esp     # Somehow the compiler got the idea we're trying to call 
                              # "sexp" as a method on self.
        movl    $3, %ebx      # ... and that it has arguments

(1) the reason we don't delay loading %ebx until right before the call is that in the case of "splat" handling (*somevar), the value is dynamically created. We should optimize this for the normal case of no splat, but as always performance is low priority.

(2) This points to a weaknes in the %s() syntax: If we add a set of parentheses too many, we'll try to call the return value of an expression. If we have one too little, we'll treat something as an explicit value instead of calling it to obtain a value. Lets take a look at the parse tree.


       (callm f bar (sexp (call __get_string .L1)))

... and this appears to be the likely culprit - a missing set of parentheses surrounding the sexp node.

This is curious. If you look at rewrite_strconst, we explicitly have a line with a "FIXME" in front talking about working around a parser inconsistency. But look a few lines up. is_call determines if this change is applied, and is_call doesn't trigger for method calls, just low level function calls. So we do this (in 245c016):


       def rewrite_strconst(exp)
         exp.depth_first do |e|
           next :skip if e[0] == :sexp
    -      is_call = e[0] == :call
    +      is_call = e[0] == :call || e[0] == :callm
           e.each_with_index do |s,i|
             if s.is_a?(String)
               lab = @string_constants[s]

And method2 now works:


    foo
    eax: 0804e0e6, ebx: 00000003, ecx: 0805059f, edx: 095c3e28, esi: 08050280, edi: 080484c0, ebp: bf7ff3f8, esp: bf7ff3e4
    (ebp): bf7ff418, 0804a857, 095c4600, 00000000, 095c4690, 0804a6d6, 00000000, bf7ff430
    (esp): 00000000, 095c4690, 00000001, 095c46a0, 0804a842, bf7ff418, 0804a857, 095c4600
    bar

How does this frame look? Much better. ebx is 3, which is (self, the optional empty block argument, and arg). I we look at the (ebp) line, we see the old %ebp, the return address, and then arguments: self, 0 for the block argument, and another address which presumably (since the code worked) is the string object for "bar".

(I don't like "presumably" - we can do better by expanding the printargs function to recognize types and dump more extensive data; for now I'll leave that as an exercise to the reader...)

In practice - features/inputs/method.rb

method.rb segmentation faults for me, and so I tried it under gdb:


    $ gdb /tmp/method
    GNU gdb 6.8-debian
    Copyright (C) 2008 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "i486-linux-gnu"...
    (gdb) run
    Starting program: /tmp/method
    
    Program received signal SIGSEGV, Segmentation fault.
    0x0804c90c in __method_Object_puts () at /home/vidarh/src/compiler-experiments/writing-a-compiler-in-ruby/lib/core/object.rb:46
    46          while i < na
    (gdb) bt
    #0  0x0804c90c in __method_Object_puts () at /home/vidarh/src/compiler-experiments/writing-a-compiler-in-ruby/lib/core/object.rb:46
    #1  0x0804d98d in __method_Foo_initialize () at /home/vidarh/src/compiler-experiments/writing-a-compiler-in-ruby/features/inputs/method.rb:6
    #2  0x0804e4cc in __method_Class_new () at /home/vidarh/src/compiler-experiments/writing-a-compiler-in-ruby/lib/core/class.rb:38
    #3  0x0804a79c in main () at method.rb:12

When instrumented with saverags/printregs in Foo#initialize, I get 5 in ebx... Yikes. --parsetree doesn't reveal any obvious problems to me. So another look at assembler output. Oddly enough, that looks fairly sane. But here's the thing: we of course don't call initialize directly - it's called by Class#new. So the next step is to instrument that too with saveregs.

Unfortunately my tries with that blows up badly. Really badly. Clearly we're close. And it's yet again the assembler output we'll want to look at, I suspect. This time of Class#new, specifically the part where Class#new calls ob.initialize(*rest).

I've annotated it inline:


        # callm :ob.:initialize
        # This part is generated by handle_splat:
        # *rest
        
        # UhOh (2)
        movl    -4(%ebp), %eax   # We get the number of arguments
        sall    $2, %eax         # Multiply it by 4 to get the size
        subl    %eax, %esp       # Allocate enough space to copy it onto the stack again
        movl    %eax, %edx       # Make a copy of the number of bytes
        leal    16(%ebp), %eax   # This is where "rest" is located on the stack
        addl    %eax, %edx       # We add it to our number of bytes, effectively getting the last adress + 1
        movl    %esp, %ecx       # Make a copy of the stack pointer we can use to iterate over
    .L144:
        movl    (%eax), %ebx     # Copy from rest...
        movl    %ebx, (%ecx)     # ..to "new rest"
        addl    $4, %eax         # Bump source address
        addl    $4, %ecx         # Bump destination address
        cmpl    %eax, %edx       # At end?
        jne .L144                # If not, jump back to .L144
        
        # This is a hairy way of getting the number of arguments back:
        subl    %esp, %ecx 
        sarl    $2, %ecx
        
        # *rest end
    
        # UhOh (1)
        subl    $20, %esp        # We allocate 32 bytes (hex $20) on the stack
        movl    $2, %ebx         # Number of arguments  before the *rest
        addl    %ecx, %ebx       # Add number of arguments from *rest
        pushl   %ebx             # Save for later
        movl    -16(%ebp), %eax  # Copy ob to %eax
        movl    %eax, 4(%esp)    # Save ob to stack, above the %ebx we pushed
        movl    $0, 8(%esp)      # Save optional block argument
        popl    %ebx             # Remove %ebx (ob is now at (%esp))
        movl    (%esp), %ecx     # Copy ob into %ecx
        movl    (%ecx), %ecx     # Copy class pointer into %ecx
        movl    40(%ecx), %eax   # Copy vtable entry for #initialize into %eax
        call    *%eax            # call ob.initialize
        addl    $20, %esp        # free stack
        
        # UhOh (3)
        movl    -4(%ebp), %eax   # get numargs
        sall    $2, %eax         # Multiply by 4
        addl    %eax, %esp       # Free splat arguments.
        
        # callm ob.initialize END

Going through it like this suddenly makes it clearer... Or maybe not?

See (1) above. Note that at this point we've spent a lot of time putting the splat arguments onto the stack. Then we go ahead and subtract 32 bytes from %esp even though we're only using 8 for the two arguments. Yikes.

Let's look at Emitter#with_stack which is responsible for this stack allocation:


      def with_stack(args, numargs = false)
        # We normally aim to make the stack frame aligned to 16
        # bytes. This however fails in the presence of the splat operator
        # If a splat is present, we instead allocate exact space, and use
        # %ebx to adjust %esp back again afterwards
        adj = PTR_SIZE + (((args+0.5)*PTR_SIZE/(4.0*PTR_SIZE)).round) * (4*PTR_SIZE)
        
        # If we're messing with the stack, any registers marked for saving will be
        # saved to avoid having to mess with the stack offsets later
            
        if @save_register && @save_register.size > 0
          @save_register.each do |r|
            if r[1] == false
              @out.emit(:pushl,r[0])
              r[1] = true
            end
          end
        end
    
        subl(adj,:esp)
        movl(args, :ebx) if numargs
        yield
        addl(adj, :esp)
      end

The comment looks promising. Except we've forgotten to take care of this.

Thankfully this is easily fixed (in b32ba75)


    -    adj = PTR_SIZE + (((args+0.5)*PTR_SIZE/(4.0*PTR_SIZE)).round) * (4*PTR_SIZE)
    +    if numargs
    +      adj = PTR_SIZE * args
    +    else
    +      adj = PTR_SIZE + (((args+0.5)*PTR_SIZE/(4.0*PTR_SIZE)).round) * (4*PTR_SIZE)
    +    end

But that doesn't fix the odd value in %ebx. Until we look at (2) above (near the top). numargs gives us the number of arguments total. But we're only copying rest.

Later, we actually want the number of elements in rest and that's an entirely different thing.

We can fix that by accounting for the number of fixed arguments (in bcb7668):


    --- a/compiler.rb
    +++ b/compiler.rb
    @@ -457,6 +457,7 @@ class Compiler
         # (needs proper register allocation)
                          @e.comment("*#{args.last.last.to_s}")
         reg = compile_eval_arg(scope,:numargs)
    +    @e.subl(args.size-1,reg)
         @e.sall(2,reg)
         @e.subl(reg,:esp)
         @e.movl(reg,:edx) 
    @@ -472,6 +473,7 @@ class Compiler
         @e.jne(l)
         @e.subl(:esp,:ecx)
         @e.sarl(2,:ecx)
    +    @e.subl(1,:ecx)

Note that this is still one huge hack, in that we're only handling splat properly in a very small subset of cases where it can be used.

That leaves us with (3): We overwrite the return value from #initialize. Of course in this specific case that doesn't matter, as Class.new will ignore that return value. But it's serious in other situations, so lets concoct up a test case.

This is features/inputs/splatreturn.rb:


    def foo(a)
      "Hey!"
    end
    
    def splat *args
      foo(*args)
    end
    
    puts(splat(1))

We expect it to return "Hey!", but it crashes because of the bug we've already found.

We fix it with yet another messy change to the splat handling:


         if splat
    +      @e.pushl(:eax)
           reg = compile_eval_arg(scope,:numargs)
           @e.sall(2,reg)
    -      @e.addl(reg,:esp)
    +      # We assume :ebx has been trashed at this point anyway
    +      @e.movl(reg,:ebx)
    +      @e.popl(:eax)
    +      @e.addl(:ebx,:esp)
         end

The issue is that we need to safeguard :eax and we don't want to use with_register as that might in the future spill registers to the stack. So we push eax onto the stack, and then we want to adjust the stack... But then %eax will be somewhere we don't want it. So we use %ebx which at this point has likely been trashed (we'll reinitialize it from the stack if needed anyway).

But... We're not home free yet. We've changed handle_splat to subtract the size of the pre-existing address list. So we need to apply that here too.

Here's finally a great test case for --stackfence. With that option, this test program gives me:


    $ /tmp/splatreturn
    [...snip...]
       ERROR: Stack fence violation. Expected 0x29fd8512
       ERROR: Stack fence violation. Expected 0x29fd8512
       ERROR: Stack fence violation. Expected 0x29fd8512
    Hey!
       ERROR: Stack fence violation. Expected 0x29fd8512
       ERROR: Stack fence violation. Expected 0x29fd8512

So lets adjust the stack adjustment again. Notice the @e.subl(args.size,reg) (in 8802ce5):


         if splat
    +      @e.pushl(:eax)
           reg = compile_eval_arg(scope,:numargs)
    +      @e.subl(args.size,reg)
           @e.sall(2,reg)
    -      @e.addl(reg,:esp)
    +      # We assume :ebx has been trashed at this point anyway
    +      @e.movl(reg,:ebx)
    +      @e.popl(:eax)
    +      @e.addl(:ebx,:esp)
         end

Final words, and next time

Debugging code generation is hairy. Unlike ordinary debugging you not only deal with broken higher level logic, but all bets are off with respect to what the code actually does. I hope this part has given some ideas about approaches and tools. But it's just a start, and frankly this part has taken me a lot longer to write than I expected, as I tried to ensure I accurately reflected the process of tracking down these bugs and ran into stumbling block after stumbling block.

Ultimately, it is down to tracking values from register to register, and memory location to memory location in some cases.

Anyway. Next time we're moving on to register allocation. There's a lot of hairy code here by now that makes assumptions about register usage that may or may not work only due to luck.

So first order of business is to walk through our calling conventions and document the register usage. Secondly, to create a more sound foundation for using registers, and leaving choice of registers to the code generation backend (and hopefully reducing the dependencies on x86 in the process, though we still have x86 code littered all over the place).


blog comments powered by Disqus