A brief introduction to Semantic Dictionary Encoding


I've read a lot of research papers over the years, but there I keep coming back to like Professor Dr. Michael Franz' doctoral dissertation: Code-Generation On-the-Fly: A Key to Portable Software (PDF). I've been harping about Semantic Dictionary Encoding (SDE) ever since I first read the paper back in 1994, and got quite close to actually implementing at one point. So let me give a brief overview, since I can't drag it out often enough (someone take it mainstream, damn it!): SDE is, at it's most basic, a compression mechanism for the intermediate representation of a compiler. Similiarly to Huffman coding, an SDE encoder and decoder pair both builds a dictionary of the datastream while building it and decoding it respectively. The dictionary entries is the used to encode/decode the following part of the data stream, which cause more entries to be added to the dictionary, etc., meaning that the later you are in the data stream, the more of the stream will refer to previously added dictionary entries instead of having to be all verbose and include the full definition. But to call it just a compression method would be selling it very short. The appeal of SDE is that it can be used as a machine independent representation of a program, while at the same time it can (if used properly) retain far more semantic information about the program than a typical bytecode. Contrary to most bytecode, SDE retains the structure of the program as represented by the compiler. This has a few very appealing properties beyond the nice parts of JIT (such as being able to target specific CPU versions): Note that the some of the above is beyond the scope of the original dissertation, though all of it is hinted at or implied by the architecture. One thing that may or may not be obvious from the above is that SDE, beyond efficiently encoding a tree version of the intermediate representation actually also turns the tree into a directed graph. Nodes further down in the tree get reused, linked the tree together at places where common sub-expressions are used. Compared to a typical VM, SDE has one potential drawback in terms of delayed code generation, and that is that the dictionary can potentially grow quite large. This is ameliorated by scoping - by marking the start of a scope in the dictionary, and then throwing away any data generated specific to the local scope when reaching the end of the local scope the dictionary can be kept fairly compact.

Performance concerns?

Franz' addressed performance reasonably thoroughly, showing that SDE in most cases - on a mid 90's system based around a M68040 CPU running at 40MHz - performed very reasonable compared to loading native binaries. Franz' mused that CPU speed at the time was increasing faster than disk speeds, and that there is/was a very real possibility that this difference might be reversed so that SDE might eventually get faster. I don't know if that held up - it'd be interesting to redo those tests. As for runtime performance, the paper contains a few benchmark tests against the then current MPW C implementation for Mac, and the tested Oberon code using SDE beat the corresponding C code hands down.

What's happened since?

Not much, unfortunately. Franz has continued work on various code generation methods, but Java came along around the same time and more or less monopolized interest the portability aspect. At least Mac Oberon had support for SDE, but as for much other great stuff coming out of the ETH computer science departments it was hampered by it's ties to Oberon, which fell out of favor together with Modula and Pascal type languages. Franz' publications page is well worth a look - he's continuing to churn out great stuff.

My interest in SDE

I first got interested in SDE as a potential feature to contribute to a longwinded debate on comp.sys.amiga.misc back in '94 about writing an open source replacement for AmigaOS. It fell by the wayside (but one of the original people, Aaron Digulla has stuck with it, and AROS is probably the closest we today have to a real open source descendant of AmigaOS). At various times I've wanted to resurrect my original attempts, but never have had the time. This time? We'll see - I'm loosely thinking about looking at adding SDE support as one of the forthcoming steps for my series about writing a compiler in Ruby. I've learned a lot about compiler technology in the meantime, and I got quite far the first time - SDE isn't hard to do, anyway - so I'm hoping that I'll find the time to do a working implementation at some point in the not to far future.

Articles tagged 'semantic-dictionary-encoding'

2008-05-06
A brief introduction to Semantic Dictionary Encoding

I've read a lot of research papers over the years, but there I keep coming back to like Professor Dr. Michael Franz' doctoral dissertation: Code-Generation On-the-Fly: A Key to Portable Software (PDF).

I've been harping about Semantic Dictionary Encoding (SDE) ever since I first read the paper back in ...