Vidar Hokstad V2.0

Home Blog

Tag: programming

2008-05-03 17:43 UTC Writing a compiler in Ruby bottom up - step 6

This is the sixth in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

Since we established last time that I'm going to blatantly steal stuff from Lisp, Scheme and friends (as well as from all over the place - I feel no need to be original with this project... Not until we're much further along, anyway), now is the right time to start introducing some more powerful concepts.

How about some deferred evaluation and anonymous functions?

Lambdas, or anonymous functions, can be passed around like values and called at your convenience (or not at all). Generally, they can access variables from the surrounding scope that gets "bound" to the function as an environment that allows it to pass state. That's a closure. What we're going to do this time is not going to bring full closure support, but it's the start, and we'll get to full closures down the road.

Let's be clear: As almost everything in programming languages, they are syntactic sugar. You can consider them equivalent to defining a class with a single method to call it and instance variables to be the environment, for example (or you can treat closures as a way of building an object system instead - a concept explored by Wouter van Oortmerssen who has been a hero of mine ever since I discovered Amiga E. If you're a programming language geek you need to take a look at the stuff Wouter has done over the years) - many concepts are reasonably orthogonal.

(As further digression, have you noticed a lot of people like me who are obsessed with programming languages that have this horrible tendency to nest sentence fragments and abuse punctuation to interject random thoughts all over the place, or is that just me?)

But instead of having to go to all that trouble, and litter your namespace with small functions, lambdas let you define them inline, and return a value representing the function instead of executing it.

For now, we'll add these constructs:

(lambda (args) body)

(call f (args))

The former will return the function - for now just the address - instead of executing the body. "call" will call the address passed with the arguments it's given, no surprise there.

So how does this differ from "real" closures?

The tricky part about real closures is that if you refer to variables from the surrounding scope, they are supposed to "stay live" for the next invocation. That is, they are meant to be kept around whenever you call the closure later on. It should be clear that returning just the address to the function isn't sufficient to achieve that. So let's look at one way of doing it, so you get an idea of the work involved (it's not huge, but not trivial either):

  • We need to create an "environment", namely somewhere to store the variables that should survive past the lifetime of the function the closure is created in. This environment must exist on the heap, and each invocation of the function surrounding the lambda must result in a new environment.
  • What is returned must allow you to access the environment. You can do this by creating an "object" and make the environment represent instance variables, or you can use a "thunk" - a small generated function that contains a pointer to the object and loads it into a predefined location before calling the anonymous function, or any number of other ways.
  • You need to decide what variables go into the environment. This could be either "everything" accessible at the time of the execution of the "lambda" expression, or you could scan the expression body of the lambda and explicitly put only the ones used by one or more lambda expression in the function in the environment. The latter is potentially a lot more space efficient, but is slightly more work.

Ok, so for now just lets get the "anonymous function" in place. As usual I'll go through the changes step by step, but note that I've done some minor cleanups that are inconsequential to these changes as well, that I won't go through because they'll lust clutter things up.

The first piece is the code to handle the "lambda" expression:

  def compile_lambda args, body
    name = "lambda__#{@seq}"
    @seq += 1
    compile_defun(name, args,body)
    puts "\tmovl\t$#{name},%eax"
    return [:subexpr]
  end

Hopefully this is pretty self-explanatory. All it does is create a function name on the form lambda__[number] that we'll use to refer to the function, since we won't actually output it inline. Nothing really prevents us from spitting it out inline, but I find it messy, so for now anyway, I'll treat it as just another function. We then call compile_defun to generate just another named function - so it's only anonymous for the user. We then move the address of the function into %eax, which is where we've established the result of our subexpressions go. Note that this is yet another shortcut - eventually we'll need to do something more sophisticated to handle more complex expressions properly, but register allocation is a complex subject (and the alternative, to push everything onto the stack works but is slow).

Last we return [:subexpr] to indicate to the caller where/how to find the result of this expression.

Next we'll do some refactoring. You may have noticed that the inner loop of #compile_exp was getting ugly, handling different types of arguments. So we extract this:

  def compile_eval_arg arg
    atype, aparam = get_arg(arg)
    return "$.LC#{aparam}" if atype == :strconst
    return "$#{aparam}" if atype == :int
    return aparam.to_s if atype == :atom
    return "%eax"
  end

Notice the appearance of :atom, too. This allows us to take the address of C functions and pass them to :call. It was so simple to add, I figured why not. To go with that, we add this to #get_arg:

   return [:atom, a] if (a.is_a?(Symbol))

Next up, as part of the refactoring, :call falls out almost by itself:

  def compile_call func, args
    stack_adjustment = PTR_SIZE + (((args.length+0.5)*PTR_SIZE/(4.0*PTR_SIZE)).round) * (4*PTR_SIZE)

puts "\tsubl\t$#{stack_adjustment}, %esp" args.each_with_index do |a,i| param = compile_eval_arg(a) puts "\tmovl\t#{param},#{i>0 ? i*4 : ""}(%esp)" end

res = compile_eval_arg(func) res = "*%eax" if res == "%eax" # Ugly. Would be nicer to retain some knowledge of what "res" contains puts "\tcall\t#{res}" puts "\taddl\t$#{stack_adjustment}, %esp" return [:subexpr] end

This might look familiar. It's because it's the guts of #compile_exp, with #compile_eval_arg replacing some of the more ugly bits. The main other change is that it calls compile_eval_arg to get the function too, and does something weird to "%eax" - it prepends a "*".

Either you're confused now, or you might start seeing possibilities - both of doing nice things and of shooting yourself in the foot. The above is the equivalent of casting an arbitrary expression into a pointer and jumping to it with no validation. It makes it deliciously easy to cause a segfault by jumping to random addresses. It also, incidentally, makes things like a virtual pointer table for a class system trivially easy, and a number of other things. Safety will have to come later. As it turns out, the "*" is needed in front of indirect "call"'s.

So what does #compile_exp look like now? The obvious answer is "a hell of a lot cleaner":

 def compile_do(*exp)
    exp.each { |e| compile_exp(e) }
    return [:subexpr]
  end

def compile_exp(exp) return if !exp || exp.size == 0 return compile_do(*exp[1..-1]) if exp[0] == :do return compile_defun(*exp[1..-1]) if (exp[0] == :defun) return compile_ifelse(*exp[1..-1]) if (exp[0] == :if) return compile_lambda(*exp[1..-1]) if (exp[0] == :lambda) return compile_call(exp[1],exp[2]) if (exp[0] == :call) return compile_call(exp[0],exp[1..-1]) end

Nice, isn't it? Compile_call, as it turns out does almost exactly the same as what the guts of compile_exp used to do, apart from what was farmed out to the other functions.

So we do a little test:

prog = [:do,
  [:call, [:lambda, [], [:puts, "Test"]], [] ]
]

(yeah, not exactly earth-shattering)

Compile and run:

$ ruby step6.rb >step6.s
$ make step6
cc    step6.s   -o step6
$ ./step6
Test

The full code for this step is here

Following parts

Since I combined three parts last time, here's an updated list of the remaining pre-written "just-in-need-of-cleanup" parts - I guess I better free up some time to write some new stuff soon, as I'm fairly sure I might end up combining a couple of the parts below when I get around to cleaning them up (such much for posting it as 30 parts...)

  • Step 7: Revisiting loops with anonymous functions, accessing function arguments
  • Step 8: Adding assignment and basic arithmetic
  • Step 9: A cleaner "while" loop
  • Step 10: Testing the language: Writing a simple text mangling program to make input cleaner
  • Step 11: Refactoring the code generation and starting to abstract out the target architecture
  • Step 12: Discussion of various concepts and future direction
  • Step 13: Adding arrays
  • Step 14: Local variables and multiple scopes
  • Step 15: Accessing variable length arguments
  • Step 16: Revisiting the text mangler - cleanups to test new functionality, and run it on itself
  • Step 17: Identifying the constructs needed to self host the compiler
  • Step 18: A start on a proper parser


2008-05-01 18:34 UTC How to sell and not sell a new programming language

"Everyone" that's geeky enough sooner or later at least toy with the idea of their own language (or like me write several half-assed toy compilers to test concepts - hopefully I'll actually see my latest one all the way through).

But what keeps amazing me is how new languages tends to get popular largely by chance. Many good languages have languished because the inventor simply did not "sell" the language right. There are a lot of languages far better suited for Java's niche than Java, for example. But Java was sold right:

The right people to target are not other compiler writers. The right people are ordinary programmers that might try out and comment on how your language works in the "real world". Far too many languages are "sold" to other compiler writers or advanced programmers based on features the average programmer might have run across once or twice but probably doesn't even know the formal name of.

Here are two examples I stumbled across, from the same language, of how to attract and not attract ordinary programmers.

The bad one first. The web page for The Pure Programming Language starts like this:

Pure is a functional programming language based on term rewriting. It has a modern syntax featuring curried function applications, lexical closures and equational definitions with pattern matching, and thus is somewhat similar to languages of the Haskell and ML variety. But Pure is also a very dynamic and reflective language, and is more like Lisp in this respect. The interpreter has an LLVM backend to do JIT compilation, hence programs run blazingly fast and interfacing to C modules is easy.

It got me slightly interested. But I like looking at new languages. It only got even me slightly interested, though. It throws a lot of buzzwords around; good ones. But what it doesn't do it say anything about why you should care.

Programmers looking at a new programming language tends to ask why they should use a new language, and what it will give them.

The firstThe second example is a file containing lots of small examples of the Pure language. I'm not convinced, but seeing a few of those snippets does a hell of a lot more for me than the paragraph quoted above..

This isn't to put down Pure or it's author. It doesn't look revolutionary, but it looks far more interesting than a lot of other new languages I've seen. It's just that seeing web pages for new programming languages that are not well known without the examples featured as one of the most prominent features is almost as frustrating to me as seeing web pages for desktop applications without a screenshot staring me in the face as soon as the page loads (hint: I'm not going to read through several paragraphs about a new IM client without seeing a screenshot first - most apps are bad and in niches where better apps are dime a dozen, and a screenshot is a good first filter, just as examples are with languages).


2008-04-30 17:25 UTC Software ICs: Reuse should not always mean inheritance or configuration

Inheritance or configuration options has a cost in terms of increased complexity that can in some cases with advantage be avoided by maintaining multiple versions of the component and adding new features to new branches instead of continuing to work on a single code base, in the same way integrated circuits often exist in a wide range of similar, static, models with the same basic functionality. Better merging support in modern version control systems make this model increasingly viable for software.


One thing I've thought a lot about over the year is why software reuse is so hard.

A big problem is that designing reusable software when you don't know where it might be reused is hard

Over the years, a number of people have brought up integrated circuits as a model for software reuse. I tried to find one of the old articles I read about it this morning, but was unfortunately unable to track it down. But this is not an original idea.

Simple ICs have a number of properties that affect how they are used:

  • When they're "complete" they're often never changed other than possibly to fix problems. The design may evolve, but the next "version" tends to be given a new designation and is often treated as a separate product.
  • There's often a myriad of different versions with smaller or larger differences - many products exists in variation rather than being configurable. Configurability often adds complexity. In hardware, complexity has a very visible impact.
  • Apart from very large complex general purpose processors, most ICs tend to have a very high cohesion, because they have to in order to make financial sense.
  • They are "black boxes" in that you can't (or won't) change them, but the details of how you interface with them and how they will respond is well documented and wel understood.

In the software world, meanwhile, we keep trying to design reusable libraries, components and services, and a lot of the time we end up with incredibly complex APIs, because we try to prepare for every eventuality.

The result is both more code (that needs to be tested, and that takes up memory) and abstractions that aren't needed for the core functionality, but that needs to be there to facilitate the configurability (which may add significant overheads).

Why should we put up with this?

Distributed version control and reuse

One thing that struck me this morning was that one of the big features of distributed version control systems promise is to ease the burden of merging, and that this is a major stepping stone towards a simpler model of reuse.

First of all, let me say that I am not against configurable components. I strongly believe in making classes and libraries generic and reusable in itself - specifically by ensuring low coupling and high cohesion. However, sometimes making a component highly flexible comes at the cost of reducing cohesion, of making the component try to please everyone at the same time by exposing interfaces that requires massively increased complexity in order to avoid exposing internal implementation details, or where the choice is taken to "surrender" and expose the guts of the component for everyone to hook into.

Both alternatives are bad.

The "software IC" idea taken to it's ultimate conclusion is this:

Develop strongly cohesive components that export generic interfaces to ensure loose coupling, and "freeze" those components - refuse to add any more features or make any interface changes or adapt it. Limit changes to internals that don't change the observed behavior other than fixing bugs and improving performance characteristics.

It's both incredibly powerful, and at first glance incredibly limiting.

Powerful because it means that when you learn a specific "model" of a component, you have every reason to believe it won't break on you. Imagine linking to the same specific version of a library and never upgrading other than selectively for bug fixes.

Incredibly limiting because software people have a feature fetish. We crave adding functionality, and go all "ohh, shiny" whenever we see something cool has been added. And that's fair, at least when it actually is helpful.

I don't want to stop that. I want to take a much more conscious approach to the fact that when DingbatShell goes from version 1.x to 2.x it's a different model - a different product - than the previous version. Upgrading, even if the API seems to stay mostly backwards compatible, requires new rounds of testing and careful review.

Software IC's aren't new - they're called branches and versions

This is the crux of the matter. You've been able to do this "forever" - and some have done. But very few take the conscious approach that this applies to the whole stack, including third party libraries, build tools that have any kind of effect on the final product etc.

Even fewer extend this to creating a multitude of branches - a new branch for every major "niche" the component is meant to work in, or every major axis of configurability.

A key reason being that in the age of version control systems that have been abysmally bad at merging changes, you really don't want to have to merge in a bug fix across 42 different versions of a component.

I'm not sure we're still quite there yet, but that's almost what I'm proposing. A vital point being that such changes should be exceedingly rare exactly because you freeze features regularly, branch of new components, and continue new feature development while leaving the branches frozen.

Only for critical bug fixes would you be faced with a potentially massive merge job. But if the components remain small and simple that merge job might not be so bad.

This is of course where the new breed of distributed version control systems comes in. Because they're distributed, better merging has been vital. A system like GIT is heavily focused around a workflow that for many users involves frequent multi-way merges of a very high degree of complexity.

We're finally getting tools that are actually specifically geared towards managing large number of branches.

What are the benefits?

Whenever there's a high cost to providing configurability, either in increasing complexity or reducing performance due to complex abstractions you have a point where it's worth considering a new "model".

You can:

  • Simplify the API - configuration options that are needed for only one or the other axis of configuration (say using a database vs a set of files as the data source, if the nature of the component is such that it's always either or) can be left out entirely. A good test for whether splitting a component into branches is a good thing is to look at how large parts of the API you can prune away or how many arguments you can remove from methods.
  • Improve performance by hardwiring logic that might otherwise go through multiple level of indirection.
  • Massively simplify testing, because the number of permutations of configurations may drop significantly (look for m*n effects, where configuration happens along more than one "axis" and where many combination may not make sense, but will still need to work - if branching the component can make testing against a single axis at the time it may be a big win).

This doesn't work for end users and not always even for developers

Imagine if an end user had to look through a catalogue to see which model of Gimp had exactly the features they want. It's not going to happen.

This is an approach for developers, and even then, it is an approach for relatively small, highly cohesive, components.

It is not a panacea. It is not always appropriate.

It's yet another tool, and an approach that I personally will start considering more seriously whenever I get to a point where I want to add more features.

But are you using it?

I have ranted about why I don't like frameworks before, and written a number of small Rack handlers for example, and the direction I'm increasingly taking for web development is to compose applications of small components designed to be extremely small, cohesive and loosely coupled.

That's the ideal scenario for "software ICs". Rather than adding more features to my dispatch class that I will quite possibly only use for a fraction of the apps I write, I will make a branch, add those features to separate branches, and pick and choose, keeping whichever version I pick for a new web app extremely simple. If I need a feature from another "model", I'll make another branch from whichever version is the closest match, and merge in the feature I want.

Currently, the dispatch class I use for this blog, for example, is about 20 lines. It doesn't need to be more. I just butchered it and removed most of the features it used to have because I realize that for this app those features were just cruft and bugs waiting to happen. I keep the code around - when/where I do need it, it's there in my repository.

The same is true for other components I use.

It reduces the need for documentation, even, because untangling the features from eachother have resulted in components that are so trivial to understand that the code does as good a job as a description, and is guaranteed to be much more precise.


2008-04-28 21:31 UTC Customizing the Ruby syntax highlighter for x86 assembler

I wrote about syntax hightlighting in Ruby earlier. The Ruby Syntax library supports Ruby, YAML and XML out of the box. But it's also pretty easy to extend to handle other languages.

Since I've been writing my compiler in Ruby series and including a lot of x86 assembler, I figured I'd see how much (or little) work adding a syntax highlighter for assembler would take.

It's by no means perfect - I've only spent half an hour or so throwing this together, but it's reasonable, and easy enough to keep adjusting.

Here it is:

require 'rubygems'
require 'syntax'

class AsmTokenizer < Syntax::Tokenizer def setup @state = :newline end

def step @state = :newline if bol? if @state == :newline # Handle labels and operands if label = scan(/[a-zA-Z.][a-zA-z0-9_]*:/) then start_group :label, label elsif words = scan(/\.[a-zA-Z0-9_]*/) start_group :directive, words @state = :operands elsif words = scan(/[a-zA-Z]+/) start_group :operator, words @state = :operands else start_group(:normal, getch) end else # Handle operators and assorted punctuation

if words = scan(/,/) then start_group :comma, words elsif words = scan(/[\-0-9][0-9]*/) then start_group :number, words elsif words = scan(/%[a-zA-Z]+/) then start_group :register, words elsif words = scan(/[\.a-zA-Z][a-zA-Z0-9]*/) then start_group :label, words elsif words = scan(/\$/) then start_group :value, words elsif words = scan(/[\(\)]/) then start_group :paren, words elsif words = scan(/\".*\"/) then start_group :quoted, words else start_group(:normal, getch) end end end end

# Register the custom highlighter Syntax::SYNTAX['asm'] = AsmTokenizer

I don't have time do write a lot of explanation - the Syntax manual does a reasonable job of describing it. The one pitfall to be aware of, is that you must make sure that your #step method advances at least one character no matter what, or you'll get stuck in an infinite loop.

Here's an example of how to use the new highlighter:

require 'syntax/convertors/html'

convertor = Syntax::Convertors::HTML.for_syntax "asm" puts convertor.convert( File.read("/tmp/step5.s"))

And here's some CSS to color the output:

body { background: black; }                                                                                                          
.directive { color: purple; }                                                                                                        
.comma     { color: white; }                                                                                                         
.paren     { color: white; }                                                                                                         
.value     { color: white; }                                                                                                         
.number    { color: yellow; }                                                                                                        
.label     { color: blue; }                                                                                                          
.register  { color: brown; }                                                                                                         
.operator  { color: lightgrey; }                                                                                                     
.quoted    { color: green; }                                                                                                         

And some example output from the compiler series:

        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $4,%esp
        movl    $.LC0,(%esp)
        call    puts
        addl    $4, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .section        .rodata
.LC0:
        .string "Hello World"


2008-04-27 19:31 UTC Writing a compiler in Ruby bottom up - step 5

This is the fifth in a series. Please start on part 1 first if you haven't already to get the background, or see here for the full series so far

Last time I promised to post these more frequently, and blatantly failed to find the time... In return, I've taken the time to combine what would have been parts 5, 6 and 7, that by themselves would've been annoyingly short. Here goes:

Handle number literals

So far our program have only handled numbers as an accidental by-product of not doing any type checking of any kind, and even then only if you have an external function that produces them.

So lets take a look at how gcc handles integers up to long long for C. Again we're restricting it to 32bit x86:

void foo1(unsigned short a) {}
void foo2(signed short a) { }
void foo3(unsigned int a) {}
void foo4(signed int a) { }
void foo5(unsigned long a) {}
void foo6(signed long a) {}

void foo7(unsigned long long a) { } void foo8(signed long long a) {}

int main() { foo1(1); foo2(2); foo3(3); foo4(4); foo5(5); foo6(6); foo7(7); }

I'm omitting most of the gcc output, you can run gcc -S yourself if you care. The interesting bit is the calls to the various functions, which end up like this:

    movl    $1, (%esp)
    call    foo1
    movl    $2, (%esp)
    call    foo2
    movl    $3, (%esp)
    call    foo3
    movl    $4, (%esp)
    call    foo4
    movl    $5, (%esp)
    call    foo5
    movl    $6, (%esp)
    call    foo6
    movl    $7, (%esp)
    movl    $0, 4(%esp)
    call    foo7

In other words, for the purposes of a function call at least, gcc threats all types except long long as a single 32 bit value. We'll conveniently forget about long long and stick to values up to 32 bit then, since that means we can keep ignoring types completely.

Laziness is a virtue.

We'll also blatantly ignore floats? Why? Because you can easily write a compiler with only integer math, and so adding float support now would only be a waste of time, when things are bound to change down the line anyway.

Besides, when I was young we didn't have those fancy FPU's, sonny, and we did just fine anyway with fixed point calculations using integers.

So what do we actually change?

In #get_arg, before we handle string constants, we add this:

  return [:int, a] if (a.is_a?(Fixnum)) 

In #compile_exp, we add this to use the return values from #get_arg:

  elsif atype == :int then param = "$#{aparam}" 

Aaand, we're done. That was almost to easy.

Here's a test:

prog = [:do,
  [:printf,"'hello world' takes %ld bytes\\n",[:strlen, "hello world"]],
  [:printf,"The above should show _%ld_ bytes\\n",11]
]


Intermission: Some thoughts on primitive data types

"Pure" object oriented languages are great, but not at the relatively low level of the code generator, in my opinion. Whether or not you want to implement a pure OO language, I strongly believe that it's worth implementing primitives that can work on primitive data types first. Then you can decide whether or not to hide those primitives completely from the user at a later stage, or find ways to make them look like objects, or quietly convert them back and forth as needed behind the users back.

Note that the Matz Ruby Interpreter (MRI) does this: Numbers for example are quietly treated differently than "real" objects, but the interpreter does it best to hide this fact from the user. Personally I don't think MRI goes far enough.


If ... then ... else

You don't get very far without some kind of conditional logic. some form of if .. then .. else ... exists in most languages that wants to be useful. For now we will implement [:if, condition, if-arm, else-arm], and do it the C way, in other words both 0 and null pointers evaluate to false, everything else to true.

Again a simple test:

void foo() {}
void bar() {}

int main() { if (baz()) { foo(); } else { bar(); } }

Giving the relevant part:

    call    baz
    testl   %eax, %eax
    je  .L6
    call    foo
    jmp .L10
.L6:
    call    bar
.L10:

This is a pretty much universal pattern for compiling if .. then ... else, across a huge number of languages and architectures:

  • Calculate the expression that occurs in the condition
  • Test it in some way (here with "testl" - other variations that are common for various architectures is a "cmp" [compare] instruction or to subtract the register from itself). testl does a comparision of the left and right operands, and set various flags.
  • Next, do a conditional branch to the
  • else arm. In other words, we're checking if the condition is NOT true. In this case it's done with "je" which translates to "jump on equal", in other words if the preceding comparison marked the result as equal (note that on most CPU's, lots of instructions set the condition codes, not just explicit tests).
  • Then execute the "if-arm".
  • Jump over the else-arm to after the end of the entire if..then..else construct.
  • Place a label for the else-arm, and the code to execute it.
  • Place a label marking the end.

There are lots of variations, such as reordering the if/else arms depending on estimates on which condition is most likely and knowledge of whether a missed or taken branch is cheapest on a particular architecture, but the above is all that's needed for now.

Again the compilation is deceptively simple - it's really just a direct translation of the above:

  def ifelse cond, if_arm,else_arm 
    compile_exp(cond) 
     puts "\ttestl   %eax, %eax" 
     @seq += 2 
     else_arm_seq = @seq - 1 
     end_if_arm_seq = @seq 
     puts "\tje  .L#{else_arm_seq}" 
     compile_exp(if_arm) 
     puts "\tjmp .L#{end_if_arm_seq}" 
     puts ".L#{else_arm_seq}:" 
     compile_exp(else_arm) 
     puts ".L#{end_if_arm_seq}:" 
  end 

It should be pretty self explanatory - it's basically calling compile_exp for each of the condition, if-arm and else-arm and weaving that into the above code, using @seq to generate unique labels.

To hook it in, we add this right after "return defun ...." in #compile_exp:

  return ifelse(*exp[1..-1]) if (exp[0] == :if) 

And here's a simple test:

prog = [:do,
  [:if, [:strlen,""],
    [:puts, "IF: The string was not empty"],
    [:puts, "ELSE: The string was empty"]
  ],
  [:if, [:strlen,"Test"],
    [:puts, "Second IF: The string was not empty"],
    [:puts, "Second IF: The string was empty"]
  ]
]

Here is the result

As usual you run it like this:

$ ruby step5.rb >step5.s
$ make step5
cc    step5.s   -o step5
$ ./step5
ELSE: The string was empty
Second IF: The string was not empty
$

Some thoughts on loops

Unconditional loops are trivial to add to the language, but do we need to? The answer is strictly speaking no. We can already do loops with recursion, so why pollute the language?

For that to be a good solution, though, we need tail recursion elimination, and I'm not prepared to get into that right now. Tail recursion elimination, or it's more general form - tail call elimination - essentially means to identify cases where the exit out of a function consists of a call with the same or fewer arguments, followed by returning it's return value. If that is the case, you can reuse the current functions call frame for the arguments of the function you are calling, followed by a "jmp" instruction instead of "call". "jmp" doesn't push a return address onto the stack, and so when the function that you jump to return, it will return to the place the current function was called from, rather than to the current function.

This achieves a couple of things: First and foremost, the stack doesn't grow. Secondly, you save some precious cycles. Combined with appropriate other optimizations, with tail call elimination, you can implement a loop like this without being worried about running out of stack:

[:defun, :loop, [], [:do, 
  [:puts, "I am all loopy"], 
  [:loop] 
], 

[:loop]

In other words, tail call elimination means that for any function on the form "(defun foo () (do bar foo))", stack usage goes from being proportional to the depth of the tail calls to being 1 frame deep at that point...

The above will run with the current compiler, but it'll quickly overrun the stack and crash. Not good.

I sense a disturbance in the force. All two geeks reading this shuddering at the thought of the stack growing with each iteration..

For now, we'll just ignore this, and not do anything to abuse the stack too much. Instead we'll later implement a proper loop construct as a primitive. At least for now - if/when I add tail call elimination we can consider ripping it out again and make it part of the runtime library instead.

So we can do non-terminating loops... That doesn't do us much good now does it?.

Well, we can already do while loops too:

(defun some-while-loop () (if condition (some-while-loop) ()))

Not very satisfying, but it works. It's too ugly to be an acceptable workaround, though, so a "while" primitive is on the list.

(As the realization that I've used Lisp like syntax sinks in, there's another disturbance in the force as the same two geeks as before shriek in agony and bolt for the door...)

I'm not a Lisp'er. I can't handle the parentheses overload... But Lisp syntax is convenient when the language doesn't really have a syntax of it's own yet - that'll come after bootstrapping when it's time to write a proper parser. Incidentally, most of what I've implemented and will implement will bear some kind of resemblance to badly broken LISP - as it turns out, Lisp constructs are very powerful if you can stand the syntax, and even if you don't want to program in it, learning more about Lisp and/or Scheme is worthwhile.

Personally I don't heed my own advice, and most of the Lisp-like ideas I'll be using are probably based on incomplete understanding from the very limited time I've spent looking at Lisp.

Next time: Anonymous functions, maybe more.


About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Categories

StumbleUpon My link page

(Links I have stumbled and like)