An 'assembly' language for parsing 2005-03-26


I've mentioned my forays into push parsers previously. But after looking at that approach, I realised I needed a bit more flexibility. So I got the idea of designing a tiny assembly like language for building push parsers with. This is analogous to building an NFA or DFA, but with more operations, and the potential for being much easier to deal with manually.

I ended up with a tiny set of core commands, and I plan to add a few more convenience commands implemented in terms of the core. Here is the core command set I came up with:

RET -- Return with status flag set to true
BEQ -- Branch if status flag == true
BNE -- Branch if status flag != true
STO -- Store token buffer in numbered slot
TRG -- Trigger an event (used to "plug in" native code to build a parse tree)
ERR -- Return with status flag set to true. Unget any characters retrieved in subroutine, and revert token buffer to pre-subroutine state
CLR -- Clear token buffer
JSR -- Jump to subroutine. Push unget buffer and token buffer to stack.
JMP -- Jump
CMP -- Compare to input character, and set status flag accordingly. Yields if no input.
EAT -- "Eat" input character and add to token buffer and unget buffer. Yields if no input.

In addition I have so far implemented the following two compound commands:
TRY char -- CMP ch; BNE n; EAT; n: ...
REQ char -- CMP ch; BEQ n; ERR; n: EAT;

I plan on adding a few more, including a "range compare" and possibly equivalent TRY/REQ variations, as well as a kleene star command, and possibly a "one or more" (i.e. "foo foo*") operator.

As far as I can see it would be trivial to implement a "compiler" to compile BNF into this language, and the VM is less than 300 lines of C++ (including lots of debug output). It would be trivial to JIT or build a code generator spitting out C/C++ or another language as well.

I plan to spend some time writing a basic assembler for it first (tired of adding calls to build it inline without any relocation/label support...). Then we'll see.

So far I like this approach a great deal better than lex/yacc or other compiler construction toolkits I've seen.


blog comments powered by Disqus