Parser generation from BNF 2005-04-17


I've been slow to post the last few days because I've been working on my parser assembler, and in particular a tool to generate assembly from BNF. I ran into two problems that's been slowing me down:

- I'm not quite comfortable with the trigger mechanism yet - the issue is that it's hard to gather up enough data so that a trigger is "self sufficient" in that I don't need to collect data from multiple trigger events before taking an action. I have a potential solution that I'll outline below.

- The second is code generation. I initially wanted to avoid building a parse tree from the BNF entirely, and just output assembly right away. Unfortunately that turned out to be extremely painful, so I've resorted to building a tree per BNF production, which seems to work. I now want to simplify the code - the code generation is about 500 lines. While that isn't much, I think I can do quite a bit better without losing much in terms of performance. The tree building here is directly related with the first problem...

One of my previous parser generators had a similar mechanism to my current one, but initially it built parse trees instead of triggering events. I think a hybrid approach might be better. The tree building code worked so that every rule would cause data to get generated. Just as in the BNF example I posted it allowed me to store the results in "slots". However my previous generator used named slots and was typed - returns from built in rules would generate strings, returns from productions would be structures that contained any slots used in that production.

The result was that you could easily build a full parse tree. However one of the reasons I started looking at a push parser was for the flexibility, and much of that is lost once you constrain yourself to building a full parse tree.

The reason just the trigger mechanism is limited is because I either have to pull the store instructions high up into the grammar, in which case it's hard to pick up just the pieces I want, or I need a way of distinguishing, say, multiple strings that would normally be stored in the same slot.

The parse tree approach fixes that by generating a structure so that the store instructions are local to each production - storing data into the structure tht will be returned instead.

An approach that might help is to allow this mechanism, but keep the triggers, so that the triggers still clear out the storage. That way I can still parse huge files, as well as pull the triggers far enough up in the grammar that I can easily process everything without having to coalesce data provided via multiple triggers.


blog comments powered by Disqus