Vidar Hokstad V2.0

Home Blog

Welcome! This is an ARCHIVED page from my old blog

In addition to taking a look at the entry below, why don't you also take a look at some other recent entries:


If you like what you see, please also sign up to the RSS feed

2005-03-26 13:11 UTC An ?assembly? language for parsing

« Starved for Logic |

Main | ACM Queue - On Plug-ins and Extensible Architectures »

An 'assembly' language for parsing

I've mentioned my forays into push parsers previously. But after looking at that approach, I realised I needed a bit more flexibility. So I got the idea of designing a tiny assembly like language for building push parsers with. This is analogous to building an NFA or DFA, but with more operations, and the potential for being much easier to deal with manually.

I ended up with a tiny set of core commands, and I plan to add a few more convenience commands implemented in terms of the core. Here is the core command set I came up with:

RET -- Return with status flag set to true
BEQ -- Branch if status flag == true
BNE -- Branch if status flag != true
STO -- Store token buffer in numbered slot
TRG -- Trigger an event (used to "plug in" native code to build a parse tree)
ERR -- Return with status flag set to true. Unget any characters retrieved in subroutine, and revert token buffer to pre-subroutine state
CLR -- Clear token buffer
JSR -- Jump to subroutine. Push unget buffer and token buffer to stack.
JMP -- Jump
CMP -- Compare to input character, and set status flag accordingly. Yields if no input.
EAT -- "Eat" input character and add to token buffer and unget buffer. Yields if no input.

In addition I have so far implemented the following two compound commands:
TRY char -- CMP ch; BNE n; EAT; n: ...
REQ char -- CMP ch; BEQ n; ERR; n: EAT;

I plan on adding a few more, including a "range compare" and possibly equivalent TRY/REQ variations, as well as a kleene star command, and possibly a "one or more" (i.e. "foo foo*") operator.

As far as I can see it would be trivial to implement a "compiler" to compile BNF into this language, and the VM is less than 300 lines of C++ (including lots of debug output). It would be trivial to JIT or build a code generator spitting out C/C++ or another language as well.

I plan to spend some time writing a basic assembler for it first (tired of adding calls to build it inline without any relocation/label support...). Then we'll see.

So far I like this approach a great deal better than lex/yacc or other compiler construction toolkits I've seen.


About me

E-mail: vidar@hokstad.com
Skype: vhokstad
View my LinkedIn profile

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Recent posts to my blog

StumbleUpon My link page

(Links I have stumbled and like)