NOTE: This is an archived page from my old blog
Vidar Hokstad's random musings: April 2005 Archives

« March 2005 | Main

On HTTP abuse

Ryan Tomayko has an interesting entry on HTTP Abuse and more specifically about the lack of proper tool support and knowledge of how to make full use of HTTP.

Web 3.0 as an operating system

In Web 2.0? Try 3.0, Dan Gillmor asks:

If the web is becoming an operating system in its own right, can anyone monopolise it the way Microsoft did on personal computers? As long as the web’s basic functions remain open, the threat is more theoretical than real.

I think the threat of monopolisation of the web as an application platform is very limited. One of the key aspects of the move towards web services is that anyone can make services available, and the formats are open because they need to be - if they're not open you'll see limited use.

The one possibility for Microsoft or someone else is to build tools that provide a killer app for interaction between sites in a proprietary way and make them widespread enough to become "required". However given Microsofts failure to get even Passport to become widespread I think the risk of this is low.

Particularly given the ease of duplication. It is extremely hard to create a compelling service on the web that can't be duplicated easily. The only real barriers you have are copyrights on content (you could think about patents, but in general patents are usually possible to work around) and brand loyalty of an existing userbase.

Neither of these are likely to be significant obstacles in the same way as Microsofts' near monopoly on Office applications have been for instance.

The one area that is a bit worrying is video and audio content with DRM, but recent events have demonstrated quite clearly that DRM in general is easily breakable and only a barrier to people that have "acceptable" alternatives that removes the incentive for them to spend time on digging up tools to break the protection.

As long as we are vigilant about fighting any attempts to take away our ability to break DRM through legislation I think we'll be just fine. But even with a repressive legal environment DRM will continue to get broken as quickly as new schemes are created.

But for other types of web services we also need to remember that it is in a service's best interest to be available to as many people as possible - unless a proprietary solution truly offers massive advantages it is unlikely to ever get traction. Locking out a few percent of the userbase can be enough to eradicate your operating margins and give your competition the advantage they need to demolish your business.

Microsoft were able to establish their monopoly in large part because they did not have an organized opposition from any large movement strongly focused on open standards. Any proprietary service provider attempting something similar today will have to except strong opposition.

Consider for instance the cases of vs. or the recent Bitkeeper mess as examples of what kind of reaction you're likely to face if your attempts at lock in online gets too serious (Bitkeeper) or even just don't release the source ( to a service seen as important by the open source community.

Turtle progress and parser assembler

After my success with the BNF generator yesterday, I just had to redo my Turtle parser based on the BNF in the Turtle spec. Thanks to the clarity of the grammar I rewrote the parser in less than an hour. Though "rewrote" is exaggerating - I cut and pasted the BNF from the spec and massaged it a bit (for one, I've realised that I messed up the priority of the "or" operator so I need to paranthesise bits - I'll have to fix that) and added the appropriate triggers and start adding error handling.

It seems to pass all the tests referenced in the spec now, though I've taken one or two shortcuts with character set handling that needs to be fixed.

However doing the Turtle parser now and the XML parser last night made me think of one important improvement to make, though it will cause the BNF generator to grow quite a bit:

A lot of the error handling can be added automatically. The key is that an error in a recursive descent parser should generally be added in the first position you know you don't have any further valid paths. In other words, you "only" need to create a graph and do some basic path analysis to figure out that in Turtle for instance, if you've successfully swallowed "@prefix" no other rules can handle the input if you were to backtrack back to the "@".

As a result, you should give an error message immediately if the next step after "@prefix" doesn't match expectation, or if the remainder of the rule as a whole fails.

At the same this this analysis allows significant optimisations of the code: By factoring out the common parts of rules, they can be combined and the code paths merged.

This is essentially what a good NFA/DFA generator will do, and when you look at it, the parser assembler isn't that far from a DFA converted into the steps to execute. The fact that it's human readable, portable and easy to customise or hand write code for is the main advantage as I see it, not the functionality of the VM itself.

As it stands, though, the BNF generator generated Turtle grammar takes only 30 more bytes than the hand written one that I've just retired, and that will have to do for now while I experiment some more (the bug list for the BNF generator and parser assembler is already getting quite long, and it will get longer, but so far all of them have reasonable workarounds so I'm just recording them for now).

All in all, the Turtle parser C++ and BNF combined is now down to just 264 lines of code (without generic support code like the VM at 369 lines), which I'm quite pleased with. To do something useful (like building an efficient RDF graph) will obviously require quite a bit more.

XML Parser with my parser assembler

One of the best ways of testing a new translator/compiler/interpreter is to feed it complex input. So I set about creating an XML parser.

It took me less than 4 hours to get a more-or-less working SAX parser for the complete XML grammar, and more than half of that was troubleshooting the parser assembler (I found a few problems, and a few places I need to make enhancements).

Now, this is what prompted me to look into a push parser architecture in the first place.

The XML grammar was essentially just the grammar from the W3 XML specification with some minor changes to match my BNF variation and add triggers for semantic actions.

It's not 100% complete yet - for instance I don't restrict all the character classes correctly - but it's fairly close. Also, I don't handle any encodings properly yet, but the parser architecture makes that fairly easy to do: I need to add support for byter order markers, and then place a trigger for the encoding attribute and let the trigger handler plug in an object to filter the plugin for the right encoding (providing it's one I want to support).

The entire parser assembler is prepared to handle full unicode characters anyway - it's all 32 bit characters internally.

I love my new toy :)

April 22, 2005

My BNF -> parser assembler translator works!

As part of my parser assembler work, I've gotten my BNF to parser assembler translator to translate it's own BNF to parser assembler equivalent to it's own parser... In other words, it's successfully bootstrapped.

There are still some holes to plug, and the code is a real mess, but this means that the BNF generator is somewhat usable... Saves a lot of work - the BNF for the BNF parser is around 60 lines including whitespace, while the generated "assembly" is 340 lines. The 60 lines include trigger annotations and some error handling. That brings the total for the BNF tool to 600 lines (BNF + C++) - not bad, though it'll easily grow a few hundred lines as I clean up the code and turn it into a bit more polished application (currently it only reads from standard in and writes to standard out, for instance...). There's also certainly room for generating more optimal "assembly".

If I get time I'll post the code this weekend or next (depending on whether I actually get myself to read for my exam on Monday).

'Infomania' worse than marijuana

BBC Online: 'Infomania' worse than marijuana

Workers distracted by email and phone calls suffer a fall in IQ more than twice that found in marijuana smokers, new research has claimed.

I assume they're referring to a temporary "loss" while you're in a state of being distracted and dealing with the interruption... Otherwise I'd probably have a negative IQ by now ;)

But it does underscore my comment the other day about setting a schedule and sticking to it, and include one or more slots for dealing with e-mail in your schedule.

April 21, 2005

ACM Queue - UML Fever: Diagnosis and Recovery: Sidebars from Scott Ambler and Craig Larman

Via Peter O'Kelly's Reality Check:

ACM Queue - UML Fever: Diagnosis and Recovery: Sidebars from Scott Ambler and Craig Larman

I particularly loved this quote:

'Starting in the late 1990s I became curious how the UML CASE tool vendors (including Rational, TogetherSoft, and others) built their own UML tools. So I started to communicate with some of the developers or managers of these tool projects. When I posed my standard question, “Did you use the UML and your UML tool to create its next release?” the answer–in every case–was “We didn’t” and typically a fleeting facial impression which suggested either embarrassment or “Are you crazy?!?”'

However the thing that really hit home was this:

My observation is that lean and effective organizations conduct upward of 90 percent of all modeling as sketches on paper or whiteboards, with only a few using sophisticated software-based modeling tools. The reality is that the value is typically in the modeling effort itself, not in the model. Why is it that so few software modeling books or academic papers seem to focus on sketching in terms of recommended best practices? Shouldn’t we help developers to get better at what effective modelers actually do in practice, instead of trying to inflict the questionable visions of tool vendors and/or inane academic theories on them?"

My biggest problem with UML tools has always been that they don't let me sketch. All UML tools I've used try to enforce a specific vision of how I should work with UML and try to force me to draw diagrams that look a particular way.

Now, obviously UML is supposed to be "standard". However there are still a number of ways most of the tools deviate from eachother in the look of diagrams, and often what you want is NOT standard UML, but almost standard. You want to fudge something - to say "roughly like this [insert handwaving here]".

Invariably, tool shortcomings have forced me to either revert to generic diagram tools which are hellish to use for complex diagrams (why does noone do diagram editors right) or to post process the diagrams from a UML tool in a drawing program (Gimp isn't exactly the most suitable UML tool available, but when the damn editors can't do their job, it's invaluable).

In general, the problem is that most of the tool writers seems to believe they know the "only true way" of using their tools, while in reality humans have very specific ways they like to work with concepts, diagrams and models, just as with code. If the tool doesn't fit that model, people will go to extraordinary lengths to work around the tool to keep working the way they are comfortable with.

A good UML tool would work with the user, not against him/her, by actively supporting the user in violating the model when that is what the user wants. Fine, show an unobtrusive warning, but don't prevent the user from creating the model they want.

Do I believe in UML based code generation?

Not with the current generation of tools. In general, I do believe code generating tools can work, but they'll need to be designed to be flexible enough for the user to bend them into doing what the user want, instead of enforcing a model that does not fit the way the user works, and that will take a lot more research.

MySQL founders: Kill all the patents

Great article in eWeek on the anti patent views of two of the MySQL co-founders (and some notes on MySQL 5.0): MySQL Founders: Kill All the Patents

'[Patents] just stall innovation. Look at an extension of patents. I don't see any difference in a software program and a recipe in a book. It's the same thing to me as a programmer. It's them saying, "You're not allowed to write the sentence you're writing right now because somebody patented it."'

Information overload

In the last couple of months, I've added at least a hundred feeds to my aggregator. At the same time I've suffered from information overload in my e-mail for years, to the point where I now mostly just scan the subjects of my main mailbox and direct anybody I actually want to make sure I answer quickly to a variation of other addresses depending on context.

The problem is I want information. I'm an information junkie.

The problem is that most of what I receive is noise. 99.9% of the e-mail I receive (AFTER having removed the spam) is unimportant, and at least 95% of it is uninteresting.

The issue is bad filtering.

"Good filtering" is still a big problem, and one that can not be solved exclusively with Bayesian filtering.


One of the key problems I'm facing in managing my information flow is that my connections to people are often transient, and increasingly so - I might exchange 2-3 messages based on answers on a blog and it might take a year before I talk to that same person again unless we're generally interested by the same things. The incoming message probably won't be junked by my spam filter, but that's not the problem - the problem is determining whether or not a message is more important than another.

Another issue is classification: I strict hierarchy works for a lot of tasks - for instance I use a deeply nested folder hiearchy that is very static at work for managing my projects. However at home that doesn't work, as my interested change rapidly with regards to more peripheral areas. I do, for instance find OpenGL programming fascinating, but it's not something that's important enough for me to want to spend time on regularly, so there will be occasional bursts. At the same time, some messages about OpenGL will always be interesting because the content have other important factors (i.e. some new development, or someone have made a great new rewrite of "Elite", one of my all time favourite games).

The problem is that many of these interests are also transient enough that I can't be expected to train a Bayesian filter to recognise them on word frequencies alone - I need more context.

I want a neural network and a set of agents to monitor my general activity and build a profile of me.

I want a system that will sort my mail not just based on my mail, but based on the fact that I wrote an entry about a similar topic on my blog a week ago, or that one of my (unpublished, unmentioned) programs contains comments that seems similar, or that I've written a (private) diary entry about it, or that the person writing it happens to be the operator of a website I spend a lot of time on, or the owner of a blog I read regularly.

I want the same for my RSS aggregation - it needs to be powered by an engine that knows me.

Yes, I'm difficult, and I realise I'll probably have to settle for less for a long time.

Parser assembler update

My BNF to parser assembler translator is coming along nicely, allthough slower than I had hoped. My plan is to finish most of it within a week or so, and put together a webpage with the source code etc. the following weekend. Then it'll be time to go back and test it properly with the Turtle grammar before I move back on to my XML parser.

So I'm finally 30...

I'm starting to feel like I have to grow up soon now... ;)

April 20, 2005


is a search engine that analyse the text on a page to try to extract the text that answers a natural language question.

I played with this idea 5-6 years ago, but shelved it partly because I don't know enough about natural language processing, and partly because there's a huge problem:

How to determine whether an "answer" is reliable. When you present just an extracted answer as opposed to a full page, you make it significantly harder for the user to determine the reliability of the source.

You also face the problem of how to determine which of mutually conflicting answers are right (and contrary Google's PageRank, you can't determine reliability based on popularity)

Despite that, though BrainBoost is interesting to play with - it's always fun to play with various alternative search engine technologies and see what progress they make.

Crib notes for the Turing test

Crib Notes for the Turing Test

'Humans have a different term for everything! "Forget" = deallocate memory "Medicine" = debugging "Sleep" = database regeneration "Watching TV" = idle loop '

Say it with a postcard

PostSecret is a site that posts images of homemade postcards they receive that people have written their secrets on...

Some of them are quite remarkable, and most are intriguing for the combination of words and imagery and the fact that you get so little context and have no idea who the person who made the card is.

Managing time

One of my biggest challenges has always been managing my time. The main reason is because by nature I'm all over the place. If you've read my rants on RDF, parsing technology and the DOM for instance, you might have noticed how I started looking at RSS, decided I wanted a better C++ DOM parser, started playing with RDF and Turtle, got sidetracked into writing a parsing tool to help me write a Turtle parser, started writing a BNF based parser generation tool, before I started unwinding the stack, working my way back down to the original goal.

Eventually I'll get there, but while the above approach works fine for what is essentially a learning experience and a hobby, it does not work fine in a job setting where you need to deliver.

To me, these are the essential tools to managing my time:

  • A TODO list

  • A daily schedule

  • A list of what I've actually done

  • Keeping and maintaining a todo list makes it harder to brush tasks under the carpet. It forces me to realise how much stuff is waiting to be done. Let it grow, but prioritise so that you don't get overwhelmed by the choices.

    However, a todo list in itself is easy to ignore - I can easily just spend a day fiddling with other stuff and never once look at the todo list. Which is where my daily schedule comes in. It's something I started doing relatively recently, but which works wonders whenever I'm not motivated to work:

    Every morning, I sit down and pencil in time slots for what I want to do when during the day, including things like "update and review todo list". I take care not to fill everything, because something always pops up. But setting myself definitive deadlines helps contain "dead time" to small well contained breaks spread through the day.

    It also helps contain time eating bullshit like checking your e-mail every two seconds to within pre-allocated slots. If something is too important to wait until a slot 2-3 hours away, people will call you - there are simply no excused for continuously checking your mail.

    The last item, keeping a list of what I've done is mostly for my conscience. If I see it doesn't fill up, I start thinking about how I'd justify it if asked. It does wonders for actually making sure I spend my time doing stuff I can justify spending time on.

    The list is important for another couple of reasons as well: It helps me improve my time estimates by documenting how much time something really took, and it helps me documenting the effort I put in at work in case I'm ever in a situation where it's clear my effort isn't been noticed.

    I'm doing that one out of experience - it's far too easy for people not to notice what you're doing, in particularly when you're doing it well and everything is going smoothly and people don't see you or your team running frantically around. Once that happens, you don't want to have to spend a lot of time digging up documentation to show what you were doing. You'll be far better off being able to present detailed notes day by day documenting it all, and chances are those notes can help head of what could easily become a deep conflict.

    A more macabre reason to keep detailed notes, including the list of what you're doing daily, is the big red bus scenario. That is, you step out in front of one. The better you are at documenting what you do, the better the company will be off.

    And while you might not care how the company will do if you are dead, think of it as setting a good example, in particular if you manage a team: within software engineering in particular there's way too little appreciation of the time honoured science practice of keeping a detailed journal of what goes on.

    Having your team members take up the practice can easily save your ass down the line when one of them decides to take off with no warning, dies or ends up in hospital for an extended stay, is brutally wrestled from your team by another department, or otherwise becomes unavailable.

    April 19, 2005

    Uncovering the simplicity of programming

    Subtext - an "example centric" programming language.

    While I didn't particularly like Subtext itself from what I've seen so far (that doesn't say much, I'm extremely picky), I like a lot of the overall ideas, and there's a document on that site everyone should read:

    Manifesto of the Programmer Liberation Front

    It's a long read, but lots of interesting thoughts, including gems like this:

    Terseness is prized in Math and Science. While being able to write Quicksort in 3 lines may seem like a powerful feature, it is in fact bad for the actual practice of programming. Theorems and Laws of Nature may last a century before becoming obsolete, and are generally about widely relevant concepts. Programs are lucky to make it to the next release, and are mostly about highly localized and detailed concerns. The primary use-case of programming is the modification of code written by someone else. Readability, as opposed to terseness, is far more important in programming.

    Many programming languages are designed as if saving keystrokes was important. This might have been appropriate when we used teletypes to program, but it is not relevant in real life programming, where far more time is spent reading and thinking than typing, and especially when there is a smart IDE to fill in some of the typing.

    There are some areas of the manifesto I disagree with, though. For instance, as much as I really want a programming language that uses a more structured approach, and is less tied to text, there is a fundamental problem with that:

    Editors are everywhere, paper even more widespread, and people write articles and textbooks.

    Textual programs are easy to jot down notes about and reproduce on any kind of system. Anything that depends on graphics or a specialised tool is severely restricted in that regard.

    I've spent a lot of time thinking about new approaches to programming languages myself, and one of the things I decided was that as much as I'd like to free myself from being tied to text, I could not do so completely. Instead, my approach was to start experimenting with a language design specified entirely based on it's semantics (this is not a new idea, btw.), and with at least two representations: A textual, human readable, representation suited for interchange, and a representation suited for editing and "exploration" in a structured tool.

    I don't think Subtext would be too unsuitable for that - it's mostly text based, after all, and what the tool provides seems to be more about providing a well defined IDE as the primary method of editing and working with programs as a structured graph.

    And while you're thinking about this, read Stefano's take on it as well. I particularly liked his comment about how it's just like RDF

    Setting clear expectations

    During my years of managing and being managed, there is one aspect in particular that I've noticed is often badly handled.

    Setting expectations is a hard requirement for getting the performance you want out of somebody.

    But it is also a hard requirement for keeping someone motivated.

    From personal experience, I know that if I don't get clear expectations communicated to me, two things will happen:

    - I'll find it hard to stay motivated, because I won't know what will be the right thing to do. I might know what would be good for the product, but I won't know if that's what my manager wants, and I won't know if it will be rewarded or punished (with a scathing review, for instance), or even if conflicting strategy has been agreed "higher up".

    - I'll work on what I want to believe is important, not on what the business believes is important. If you're working on an area where you're a visionary, that's cool to an extent. You can play around with whatever you want, and the results may be great. However, though you might know what is needed, you won't know what your boss will be measuring.

    That uncertainty is destructive.

    I've experienced it and hated it under bad managers, and I've seen what happens to others whenever I have been too preoccupied to set clear expectations for them.

    Nothing is worse than a manager saying "I've got a hands off approach, so I'll let you do what you feel is best". Translation: I don't know what the heck you do, and I don't care enough to find out.

    That does not mean that giving your team members freedom can't be good, or even important, with the right people. But it means that you must still sit down with them and set targets, goals and expectations.

    And when you agree on those goals, you must follow them up, and demonstrate that they matter (that you do bring them up in the year end review, for instance), or you can set all the targets you want, and most people will slowly test the limits, and then start ignoring them - perhaps not intentionally, but because there are other things they want to spend their time on which they think are more important (and, they rationalise with good reason, you've admitted so because you never did ask for that report that was 3 weeks overdue, so it couldn't have mattered that much - never mind that the report might have been there to set a deadline and force them to focus on doing the work they're reporting on in the first place).

    Setting expectations of yourself is hard, and few people will consistently be able to do it and push themselves at the same time. Getting clear expectations from someone else enables you to see where your focus should be.

    Personally I'm the kind of person who works best with a deadline looming, and so I prefer to have clear milestone deliverables, because that forces me to do the work in chunks rather than in a mad rush at the end of a project. The alternative requires a lot more discipline in following something up every day.

    It's not always about performance - I can push myself to deliver on a regular basis, but that's where motivation comes it: It's demotivating to be the one that has to push yourself, instead of having someone give you clear feedback when you do well or when you don't deliver.

    Another issue ensuring the expectations are measuable. In a way it confuses the point, because an expectation that isn't measurable isn't really an expectation at all, but just a sentiment of what people want to feel about your performance when they think of it. "Should have finished a major project" isn't a real expectation unless your organization have a clear unambiguous definition of what "major" means.

    It still leaves you in the dark wondering if you and your boss will both agree that project Foo that had you working 80 hour weeks and live on caffeine was major, or if they think it wasn't such a big deal because it was delivered on time, on budget, and was really only meant to be a 2-3 month little thingie for you to do befure it ballooned when someone added extra requirements.

    Next time you sit down with your boss, ask for his/her expectations. Write them down. Keep those notes, and look at them regularly, and ask if they've changed and if they've been met.

    And if you manage, sit your staff down one by one and make sure they know what your expectations are.

    You'll all be happier.

    April 18, 2005

    The "problem" with Global Menu Bars

    Gregory Williams gets it.

    He's criticising a OS News article called The Problem with Global Menu Bars.

    As a past Amiga user I agree with almost everything he says.

    The global menu bar is to me still a much more elegant solution than MDI to presenting a proper menu bar, and aids in saving screen real estate (something which was essential back when 640x512, or 640x400 for poor NTSC users, was the resolution of choice).

    It gives predictability, and made for a very lean interface.

    As for saying "global" menus don't work well with multiple monitors, I don't agree. One of the key features of the Amiga was multiple draggable virtual screens. From that perspective, multiple monitors means just adding more "screens", and the menus were tied to the top of the currently active screen.

    I don't even think that the criticism is even based on a power users perspective - I liked the Amiga (and by extension the Mac) approach exactly because it freed up real estate that let me pack a lot more information onto the screen.

    The point that there is a key difference between the application oriented approach of Windows and the document oriented approach of the Mac (and to an extent the Amiga) is good, though. The lack of Windows style MDI makes applications a lot more "invisible", and the global menu reinforces that by integrating the application into the overall UI.

    This is one of many Amiga features I still miss on my current Linux setup. China Builds Nano-operated Robot

    If true, this report seems like a big leap forward. Unfortunately the report is extremely brief, and doesn't give much to go on.

    Chinese nanotech research does appear to be rapidly advancing, though, with past announcements of material advances, and cooperation with Germany exemplifying their increased activity and relevance in the field.

    EU Parliament starts second reading on software patents directive

    From FFII: Time is running: European Parliament received "uncommon" position

    Definition of irony: RIAA members worry about Steve Jobs near monopoly power

    It's hard not to take pleasure in this: Music moguls trumped by Steve Jobs?

    Music labels are apparently increasingly worried over the price setting power Apple has gotten by virtue of controlling around 70% of the music download market.

    Considering this is a group that has gained it's power mainly by brutally squashing competition and using their influence over media to make it near impossible to get major traction for artists outside their systems, it's about time they get to taste their own medicien.

    April 17, 2005

    Validating a parser generator

    A large part of validating a parser generator, such as my BNF to parser assembler tool is ensuring the generated parser is equivalent to a well tested hand written parser.

    Generally, my approach is to start bringing the hand written parser for the tool into line with a version of the parser for the tool specified in the tools own language as soon as I possibly can.

    The reason is that by being methodical and writing the parser in the same style you expect to use, you quickly identify problems with the generation, but you also make it trivial to compare the two.

    Even when it makes hand writing the parser harder, it's worthwhile, because at the stage where you've worked for a long time on the code generation, you've probably debugged the hand written parser quite a lot, and making the generated parser match so well a basic text comparison find only minor differences reduces the validation process to a simple question:

    Do you trust the testing you've done of your hand written parser?

    Don't like what the annual report on International Terrorist activity says? Just kill it

    From MetaFilter: We are winning the war on Terror? Not.

    Atheist website censored as 'occult'

    Via Atheist revolution: The atheist website God is for Suckers! has been blocked by the censorware of SonicWall in yet another example of why censor ware is bad you.

    The problem I have with censor ware is that the people being censored are rarely in a situation where they would be comfortable asking for a misclassified (and classifying an atheist site as "cult/occult" is hardly accurate) site to be unblocked.

    A parent, for instance, that considers it acceptable to censor in the first place is not the kind of parent I think a child would be likely to want to approach about a site with a name like "God is for suckers" even if it was not what the parent intended to block in the first place.

    The most unfortunate effect of censorware is that the censor has the ability to extend what is censored far outside the bounds of what was expected by their customers with little or no recourse for the people actually affected.

    Mozillazine discussion on SVG in Mozilla/Firefox

    I have been waiting for Firefox to support SVG for ages now.

    It's one of the largest missing pieces to turn Firefox with XUL into full fledged portable applications platform. There are other gaps too, such as 3D support and media support (without plugins that may or may not be available on all platforms). However with SVG at least the 2D graphics area is reasonably well covered.

    And I'm looking forward to fun stuff like monitoring app's using AJAX (i.e. XMLHttpRequest) to request data updates and adding elements to an SVG drawing instead of reloading an image, map applications using SVG for their information overlays (or even the full map, though I'm unsure how that would work out performance wise), AJAX style business graphing applications (reporting interfaces with fancy graphics will suddenly become a hell of a lot easier to do in a browser)

    Not to mention how quickly someone will use SVG together with Javascript 3D transformations to do 3D graphics using SVG 2D primitives (yes, it will be slow).

    "I lined up at the wrong theatre for Star Wars, and all I got was this lousy t-shirt"

    Wil Wheaton has been out taunting Star Wars nerds by pointing out that perhaps waiting in line at the wrong cinema as an act of protest (that is, it became an act of protest when someone pointed out that they were waiting at the wrong cinema, presumably in an attempt to appear less stupid) was kind of hilarious.

    Oh, and he made them "I lined up at the wrong theatre for Star Wars, and all I got was this lousy t-shirt" t-shirts

    Parser generation from BNF

    I've been slow to post the last few days because I've been working on my parser assembler, and in particular a tool to generate assembly from BNF. I ran into two problems that's been slowing me down:

    - I'm not quite comfortable with the trigger mechanism yet - the issue is that it's hard to gather up enough data so that a trigger is "self sufficient" in that I don't need to collect data from multiple trigger events before taking an action. I have a potential solution that I'll outline below.

    - The second is code generation. I initially wanted to avoid building a parse tree from the BNF entirely, and just output assembly right away. Unfortunately that turned out to be extremely painful, so I've resorted to building a tree per BNF production, which seems to work. I now want to simplify the code - the code generation is about 500 lines. While that isn't much, I think I can do quite a bit better without losing much in terms of performance. The tree building here is directly related with the first problem...

    One of my previous parser generators had a similar mechanism to my current one, but initially it built parse trees instead of triggering events. I think a hybrid approach might be better. The tree building code worked so that every rule would cause data to get generated. Just as in the BNF example I posted it allowed me to store the results in "slots". However my previous generator used named slots and was typed - returns from built in rules would generate strings, returns from productions would be structures that contained any slots used in that production.

    The result was that you could easily build a full parse tree. However one of the reasons I started looking at a push parser was for the flexibility, and much of that is lost once you constrain yourself to building a full parse tree.

    The reason just the trigger mechanism is limited is because I either have to pull the store instructions high up into the grammar, in which case it's hard to pick up just the pieces I want, or I need a way of distinguishing, say, multiple strings that would normally be stored in the same slot.

    The parse tree approach fixes that by generating a structure so that the store instructions are local to each production - storing data into the structure tht will be returned instead.

    An approach that might help is to allow this mechanism, but keep the triggers, so that the triggers still clear out the storage. That way I can still parse huge files, as well as pull the triggers far enough up in the grammar that I can easily process everything without having to coalesce data provided via multiple triggers.

    April 16, 2005

    Are you a victim of age discrimination?

    Ted Rall's comics are rude, extremely politically biased (you're not likely to like most of his stuff if you're a Republican, though the Democrats gets a few bloody noses as well), frequently tasteless, and always politically incorrect.

    Are you a victim of age discrimination? is perhaps one of the "safer" one, though still blatantly biased.

    Quite a few of his comics - even of the ones where I agree with the main point - are just too tasteless to be funny even for me, who do generally like quite bizarre humour.

    Yet I'm still drawn back time after time because I love the fact that he's prepared to be so extremely in your face and brutal about putting his points across, even though it must earn him tons of hate mail...

    April 15, 2005

    Matt Dillon on Dragonfly journalling FS

    I've never been much of a BSD guy - tend to run Linux at home - but I do admire Matt Dillon a lot. A co-worker mentioned this
    to me, and it's just immensely cool.

    Dillon is working on a journalling FS that allow "unlimited" (i.e. limited only by available journal space) undo. I love it. The potential for things like security (an intruder can delete whatever they want, and you'll just undo the changes and voila, you can look through all your logs to figure out what he's been up to), safety (i.e. undelete but also revert a system as a whole to a known good state).

    I remember discussing this with a colleague 7-8 years ago, mostly as a joke (my colleague wanted "everything" to be undoable, not just filesystem operations), and I'm sure others have thought about it too. But what matters is actually doing it...

    Looking forward to someone doing the same thing for Linux ;)

    My Semantic Web reading for the weekend (and the next couple of months)

    I just stumbled on
    Dave Beckett's Resource Description Framework (RDF) Resource Guide. I wish I had known about this when I was writing my semantic web essay last month... An absolutely amazing list of resources.

    Harald Welte: Giving the GPL teeth

    For a second time Harald Welte have managed to secure a preliminary injunction against a likely GPL violator.

    Add to that tireless work in discussing the GPL with alledged violators and handing out large numbers of cease and desist letters, and Harald has become one of the foremost people protecting GPL'd software from being illegally used by companies that want to keep the source closed.

    April 14, 2005

    Whiteboard Fountain

    Why don't we have a Whiteboard Dry Erase Stainless Steel Water Wall Fountain in the office? Some nice running water AND a whiteboard in one! And so cheap at only $5,998.00!

    XAML vs XUL with a bit of Entity thrown in

    I haven't bothered following Avalon and XAML a lot, so mezzoblue: Avalon/XAML First Look was a very interesting quick overview.

    My impression is that this isn't very interesting.

    XUL show significant more promise in the pure UI space for the reasons that 1) it's cross platform, 2) it's well integrated with the DOM, 3) Firefox forms a reasonably mature platform to build applications on top of.

    I realise from the examples that Avalon/XAML has a somewhat wider scope - for instance there is no 3D support in current XUL implementations, and no SVG support in the standard builds. However at least SVG support for Firefox is now right around the corner.

    As for the projecting stuff onto spheres, that sounds very much like GGI - see the cube example for instance. Old news. I liked GGI a lot but the features were never enough to give it much traction. Of never projects, it reminds me of Luminocity - an experimental X window manager that supports bizarre stuff like windows that wobble when you move them as a demonstration of new X server functionality.

    However, back to XML descriptions of user interfaces. There is one project that had immense potential, but that seems to have died...

    Entity (I've not been able to find a proper webpage for it - just the Debian package info pages). I really wish they'd kept developing it. Entity let you build a full app by writing an XML file describing the UI.

    You can define your own tags, and it had built in support for javascript, Perl, Python and C scripting. The latter was handled by executing gcc on the fly and generating a DLL that was then loaded with dlopen() (it was cached, though...) - a bit impractical, but I loved the idea.

    The greatest demo application was an app that let you modify the in memory representation of the XML document, including the in memory representation of the editor itself, and see the changes in UI and behaviour immediately.

    Entity showed promise because it was flexible, small and extensible. Please someone pick it up. Please!

    The RDF data model and databases

    I've been thinking about the RDF data model a lot lately. Including reading up on SPARQL. Initially I didn't like it. However after a while it struck me that the RDF model + SPARQL actually matches most of what I do with databases a lot more than the relational model + SQL.

    The problem with the relational model is that I normally work with "resources" that consists of data items that are generally accessed at the same time and are tightly related, yet if I want to get a properly normalised relational database, I end up with insanely complex queries if I want to gather all the information back together again.

    With SPARQL queries on RDF data this suddenly becomes simplicity itself because I'm not required to try to figure out a way to map the data I want to query about back into a single row per entity - instead I'm figuring out a way to find the triples I need, and optionally provide a pattern for extracting just the data I want, or alternatively return all the found RDF triples.

    The question is whether performance will be good enough - I haven't yet had a chance to experiment with a large scale RDF model.

    The ease of querying for a graph of data, as opposed to being constrained into a very simplistic row/column model is a compelling incentive to spend more time on it.

    April 13, 2005

    Back from Paris - in love with Eurostar

    Just got back from Paris around 8pm. Left home around 6.30am this morning. It's great to just be able to zip in to Waterloo, sit down on a comfy train and wake up in the middle of Paris just a metro ride or two away from your destination (and contrary to the bloody London Underground, the Paris Metro is a pleasure to use - for one you don't have to walk for miles to change lines).

    Spent only around 4 hours in Paris, though, so no sight seeing. It's been about ten years since last time, so it's about time I get around to going for a weekend or so again.

    Coolest part: I just love the French trains - the TGV / Thalys trains parked at Gare du Nord looks like props from an 80's movie...

    Worst part: When the metro ticket machine refused to accept my cards, my phone wouldn't roam, I didn't have a single Euro on me and the handle of my travel bag broke all within minutes - luckily my cards did work at the bureau the change at the other end of the station (though at a horrific 8% commission), and I got an ugly but solid small suitcase back at the metro station for 30 Euros.

    April 12, 2005

    BNF -> parser assembler

    Ok, so tonight I set myself a goal of preparing an initial version of a BNF grammar with some extensions intended as a starting point for a tool to convert BNF into assembly for my parser assembler (I need to think of a proper name for it soon...). Here is the BNF, and some snippets of how I'm bootstrapping it. The hand converted grammar now parses the whole BNF for itself. Here's my BNF grammar:

    ; %triggers define a list of symbolic names for the triggers that the VM will call %triggers { LP = 1 . RP = 2 . CUT = 3 . CALL = 4 . EXP = 5 . TDEF = 6 . PRODL = 7 . PRODR = 8 . RULE = 9 . SUBL = 10 . SUBR = 11 . ORL = 12 . ORR = 13 . STORE = 14 . EXP = 15 . } ; '!' represents my "cut" operator. It breaks the VM with the string argument ; as an error if the remaining part of the rule fails bnf ::= triggers? production* !"EOF Expected" EOF . triggers ::= "%triggers" ws* "{" ws* tdef* "}" ws* . tdef ::= name ws* "=" ws* number ws* "." ws* /TDEF/ . production ::= name /PRODL/ ws* "::=" (ws* rule)* ws* "." ws* /PRODR/ . rule ::= sub_expr /RULE/ . sub_expr ::= or_expr ws* ("-" ws* /SUBL/ sub_expr /SUBR/)? . or_expr ::= store_expr ws* ("|" ws* /ORL/ or_expr /ORR/)? . store_expr ::= post_expr ws* ("->" ws* const /STORE/ )? . post_expr ::= cut | call | (primary_expr ws* (('?' | '*' | '+') -> #7)? /EXP/) . primary_expr ::= paren_expr | keywords | name | string | const | set . paren_expr ::= "(" /LP/ ws* !"Expected at least one rule inside parentheses" rule+ ws* !"Expected )" ")" /RP/ . cut ::= "!" -> #7 string /CUT/. call ::= "/" -> #7 (number|name) "/" /CALL/. const ::= "#" number . ; --- "Tokens" string ::= ('"' ["]* -> #6 '"') | ("'" [']* -> #6 "'") . keywords ::= ("ANY" | "EOF") -> #1 . name ::= ([a-zA-Z][a-zA-Z0-9_\-]*) -> #2 . number ::= (base10 | base16) . base10 ::= [0-9]+ -> #3 . base16 ::= 'x' [0-9a-fA-F]+ -> #4 . set ::= '[' ('~'? (any - ']')*) -> #5 ']' . ws ::= ' ' | #9 | #xD | #xA | ';' (ANY - #xA)* #xA .

    An excerpt of the parser assembler translation. I've tried making it match what I expect to make the BNF tool generate reasonably well, but not exactly:

    :bnf kln $ws jsr $triggers kln $production cut "EOF Expected" eof ret :triggers req "%triggers" kln $ws req "{" kln $ws kln $tdef req "}" kln $ws ret :tdef req $name kln $ws req "=" kln $ws req $number kln $ws req "." kln $ws trg #6 ret :production req $name trg #7 kln $ws req "::=" kln $production_1 kln $ws req "." kln $ws trg #8 ret :production_1 kln $ws req $rule ret :rule req $sub_expr trg #9 ret :paren_expr req "(" trg #1 kln $ws cut "Expected at least one rule" req $rule kln $rule kln $ws cut "Expected ')'" req ")" trg #2 ret :sub_expr req $or_expr kln $ws cmp #'-' bne $sub_expr_1 eat cut "Expected expression" kln $ws trg #10 req $sub_expr trg #11 :sub_expr_1 ret

    Don't have much time for explanations right now, but I'd be happy to answer questions.

    Why did I not know about the Opus 25 year anniversary collection?

    Opus/Bloom County/Outland is one of my all times favourite comics. If you haven't ready it, shame on you.. If you're a fan but have completely failed to notie the 25 year anniversary collection just like me (yes, I'm deeply disturbed by the fact tht I have overlooked it for this long), head over to The Official Berkeley Breathed Website.

    Copyright and patents are NOT property

    Dana Blankenhorn posted an excellent rebuttal to Jonathan Schwartz' latest tripe:

    "Let me respond as clearly as I can. I don’t believe in IP. Patents and copyrights are monopolies, allowed under the Constitution for limited times and for a limited purpose, to encourage the creation of more. Those I believe in.

    The phrase intellectual property does not appear in the U.S. Constitution, and for very good reason. The phrase is a lie. It turns ideas into land, and allows corporations who own the vast majority of patents and copyrights to control anyone who doesn’t serve them."

    Dana is absolutely right - The very term "intellectual property" is an attempt at perpetuating a myth created by owners of vast amounts of copyrights and patents.

    It is an attempt at a land grab where no land currently exists. Ideas used to be entirely unprotectable, for a very simple reason: There is no cost in duplicating an idea. There is no scarcity that justifies a perpetual "ownership".

    Copyright and patents in their modern forms were a compromise inteded to stimulate the arts and sciences by making economic exploitation of ideas and expressions of ideas more profitable. Not an attempt at creating property.

    If corporations such as Sun wants to keep pushing an agenda that is based around redefining their rights and responsibilities at the expense of the general public, perhaps it is time people start reading up on the origins of US corporate charters, and the states rights to revoke them (exactly to protect the public against rogue corporations attempting to subvert democratic control).

    Plugged in to Microsoft's biggest rival

    Via Peter O'Kellys reality check: The Seattle Times: Microsoft: Plugged in to Microsoft's biggest rival:

    "My belief is that open-source software is going to help drive the acquisition cost of software down toward zero," he said, a shift that will require software companies to move "over to a maintenance and support model."

    Strong words coming from a Microsoft employee...

    The problem for Microsoft, of course, being that maintenance and support is a market that is a lot harder to monopolize than software sales, and if they do attempt in order to keep their margins they won't be able to stay economically competitive as real competition drives down support and maintenance cost for the alternatives.

    April 11, 2005

    The fall of mass culture?

    In FAQ: Does the rise of the LT = the fall of mass culture? Chris Anderson asks the question if the long tail - the increasing effectivity we can address niches with - means that mass culture will die.

    Some personal perspectives: I'm from Norway, a country with a population of 4 million. When I was a kid (early 80's) we had three TV channels: The Norwegian public broadcasting service, and the then two Swedish public broadcasting channels. There were satellite and cable services but very few had them.

    Everybody watched the same shows - I certainly do recognise that.

    Fast forward to the mid 80's and we got cable. And so did the neighbours. And lots of my friends at school. Not many channels - 20 or so. But the transformation was still large. The public broadcaster was (and is) still hugely popular, but significant changes immediately occured in the viewing habits of many groups. Kids, for instance, were suddenly watching US and British cartoons, and teenagers started getting more of their musical input from Pat Sharp on Sky Channel rather than local DJ's (I still find it amuzing that a British channel was so vastly more popular outside the UK than in - most Brits don't seem to be aware of the old

    20 years on and the landscape has shifted drastically. More media, more segmentation.

    While I think there will always be room for some degree of mass culture, I agree that it's influence will drop - the curve will flatten. But it's human nature to want to belong to larger groups, to want to be popular and watch and listen to what others watch and listen to.

    I think that the main change isn't that people doesn't want to find that any more, but that the sheer availability of media makes it incredibly much harder to cover the market effectively enough to build the kind of following you could easily get before.

    Semantic Web for Extending and Linking Formalisms

    I came across this paper by chance while searching for material on Z notation and RDF. It's an interesting read on the use of RDF for expressing languages used for formal methods.

    While reading it, something occured to me (I'm sure it's not an original idea, but I haven't seen any implementations): It would be great to generate an RDF representation of source code. I'm tempted to spend some time considering if there's an easy way of bolting RDF generation support on to my parser assembler.

    It would enable a rapidly growing number of tools that understand RDF to manipulate the code, and would be an interesting way of achieving some of the same as GCC-XML.

    With RDF mappings for UML and Z or similar formal languages, I'm sure someone could come up with interesting data mining tools based on combining various data (specifications, models, source) and cross referencing extracted data...

    Serving XML: Pipelines and Filters

    Serving XML is a system for pipelining XML via SAX, passing the data through a series of SAX filters and/or XSL transformations.

    Unfortunately for me it's Java (I read Java, and write a bit when I have to, but I still don't like it). However it's the concept that is interesting. SAX filters are pretty well established now, but what Serving XML brings to the table is a simple XML based language for describing the pipeline, that includes features such as various content sinks - want your XML serialized to a file on an FTP server? Use the ftpSink.

    It also features filtering of which tags to select for further work, which is reminiscent of XSL, but with the flexibility of calling out to native language filters when/if you need to.

    The syntax of the pipeline descriptions looks quite straightforward, and very useful. If I hadn't had my plate full of far too many things at the moment I'd be tempted to implement something similar for C++ and/or PHP.

    I really love the entire idea of a transformation centric approach to layering presentation and functionality. For me, it started out with parsing mail messages for a webmail platform I designed - it seemed so obvious to layer the parsing: MIME parser at the bottom, then selectively including a quote printable or base64 decoded, then code to filter out harmful HTML, detect URL's in plain text, rewrite URL's, mark quoted text etc.

    All of it got implemented as tiny( 20-30 lines in many cases) filter classes, and what could have been a big mess ended up fairly easy to understand.

    I've been in love with filters ever since. Check out Serving XML.

    April 10, 2005

    Essays marked by computer program

    I sometimes in the past have wondered how boring it must be to grade papers that all follow the exact same patterns... But I must admit I didn't think we were quite that close to having software that can grade papers...

    (It's worth pointing out that this program apparently only supplements human grading, to give student feedback before handing the paper in, and to help the academic staff get a quick assessment so they can focus on detail).

    I'd really love seeing a proper paper on the accuracy compared to having the same set of papers graded by different humans.

    Unixica - A tribute to Unix people and culture -- A tribute to Unix and its people is an interesting resource with a wealth of information and links related to the history of Unix, it's inventors and other interesting people.

    April 09, 2005

    lowercase semantic web

    Related to my previous post, I found Lucas Gonze's entry on the lowercase semantic web via Marc's voice. I'll definitively be following what Lucas' writes.

    Lucas' expresses some of the issues I've had over micro formats in a much more succint (and perhaps less brutal) way:

    I don't mean to overcommit to this argument. The problem I'm having is emotional, I think. lowercase semantic web feels like yet another religion, which I don't want. I see lots of value in being strict about HTML semantics, however I don't yet see why I'd want HTML semantics everywhere.

    Against Micro Formats for the Semantic Web

    I just finished reading Danny Ayers's ramblings on MicroFormats, and while he comes out a bit conflicted, it in some ways helped me get a clear view of my own position on it:

    MicroFormats using XHTML is to me what tables for layout was to older HTML.

    It is a semantic overloading that tries to apply meaning to tags far beyond their original intent, and while it is tempting because of the immediate advantages, I believe it will come back to bite people badly.

    One of the claims that often props up is that the main advantage of using XHTML is that it is immediately stylable via CSS, yet the current main browsers have no problems styling plain XML. A quick look at my feed (via Feedburner) demonstrates that - Feedburner have done a wonderful job of it.

    If you need any more, writing cross browser javascript to use XSLT to transform the XML is trivial - my first attempt, which was also my first attempt at using "Ajax"/XMLHttpRequest etc. and my first real javascript app, took me less than half a day to throw together. For that matter, if the page returned is static, specifying an XSL stylesheet works great across at least all Mozilla based browsers and IE if you're a little bit careful about how you write your XSLT.

    I just don't buy the convenience argument. I find a well defined XML vocabulary much more convenient, because it is often a lot clearer what the data represents.

    Personally I'm becoming a fan of GRDDL, because it allow us to pick simple XML syntaxes (or for that matter those obnoxious Micro Formats) and have an automatic way of deriving RDF, so that we can use the most convenient and most expressive syntax we want in the main document.

    I'm increasingly moving towards using XML + XSLT for publishing documents online as well. My first experiment was an online change request list for team at work - instead of maintaining an HTML version, I moved to XML + client side XSLT conversion exclusively. Works great, and resulted in a significantly smaller page (not that size matters for that particular application).

    XHTML for me is mostly something to be generated from other sources, not something I'd want to use as a source of document data itself anymore.

    As such, the Micro Formats seems to me as a distraction at best, a repeat of huge mistakes of the past at worst - instead, please give me a well defined XML vocabulary; and if you must, please just write some XSLT to generate the XHTML.

    Reading Soul Music (Discworld)

    Finally gotten some time to read again... Bought Soul Music the day my fiancee was in for her operation, and I was left to wait for hours.

    Less than half way into it so far. I love the way Terry Pratchett manage to take parts of history or literature and completely mess it up - If you haven't read any Discworld novels yet (does people like that exist?) then you should. Soul Music seems as good as usual.

    Death is definitively one of my favourite Discworld characters. How can you not love a grim reaper that is bored with his job, confused about his place in the world, and that keeps trying so hard to fit in, but never can get it right (such as his house that is larger inside than outside because he doesn't quite understand the limitations people are subject to)...

    The vivid descriptions of how the band steals a piano and gets past the night watch by pretending to be a speaking piano going out for a walk had me laughing out loud and making my fiancee giving me strange looks (as she always does when I'm reading Discworld novels).

    (Yes, I should be reading for my exam, but I'm tired, and coughing and dripping all over the place, and I just found out my average for my assignments is 93% and the specimen exam is so easy I lost all motivation to spend more time reading... Unfortunately my next two courses will actually require me to read properly...)

    Turtle parser update

    My Turtle parser is getting along quite nicely. The parser bytecode itself still weighs in at less than a KB, and I've built a very simple RDF model and code to generate triples.

    As it stands it now passes most of the pre-requisite tests specified in the Turtle spec. Once it passes the full set, and I'm done with my damn exam, I think it's time to put up a page with the source code and some documentation.

    The RDF model is in no way suitable for production use, but I'll abstract out the interface used by the parser to add triples to it, so that it'll be easy to replace it with an adapter to a proper RDF model implementation.

    The parser is still fairly messy, so after that I want to go back and improve the assembler to allow local labels and constants, and see whether I should consider some changes to the opcodes. Then I have three candidates for what to continue with: either a BNF parser to generate assembly for my VM, extending the Turtle parser to full N3 and write code to generate assembly from the N3 BNF representation of the N3 grammar, or start on a XML parser.

    I suspect I'll start with the BNF tool, as that would massively simplify the XML parser in particular. I've done BNF parser generation tools before, but never quite liked the approach I chose to generating events - I feel the trigger mechanism I've chosen now is fairly nice. The only thing I might want to do is make it easier to pass additional data from the parser to the trigger callback.

    April 08, 2005

    Paris here I come...

    ... for a day. I hate business travel. Absolutely hate it. I invariably end up with quick in and out trips - last time it was 3-4 hours in Munich and then back to the airport (that's the way I experienced Munich the 3-4 previous times as well). This time I'll probably arrive at the Eurostar just in time to get to a meeting, and have an hour or two to spare after the meeting is over (if I'm lucky) before I have to get back.

    But if I need to I need to.

    So much for telepresence - I want proper VR...

    April 07, 2005

    Parser assembler, RDF and Turtle

    I've kept working on my parser assembler, but it's moving slow thanks to actually having a day job to do... However tonight I mostly finished a parser for Turtle - a subset of N3 that allows convenient specification of RDF triples.

    It ended up at about 1KB of bytecode, which is quite reasonable. I still need to add some error handling, and then I can start writing some code to actually do some useful stuff with it.

    The current code uses an instruction to trigger callbacks from the VM on specific events, and it's proven to work very well. I'm still considering whether or not to add higher level constructs, or possibly replacing a couple of the current instructions, but I want to get more experience with it first.

    The current Turtle parser just trigger on each subject, verb and object, and on prefix directives.

    Turf online

    I tend to buy a lot of stuff online, with various degrees of success, but I much prefer it to spending time running from store to store - especially when it's bulky (I don't drive, never bothered learning how to, so I need delivery anyway). And now I've added buying turf for my front yard.

    What was yesterday a complete mess is now a nice green lawn thanks to one day delivery anywhere in the UK...

    Now I just hope it survives...

    Redland (RDF) backend for Movable Type

    Kasei is working on a Redland bakend for Movable Type - article and screenshots here

    This entry points to some of the promise of the semantic web: Harvesting data from the vast amount of semi-structured data stores already out there. Blogging software already collects data in nicely structured ways, and stores them in databases.

    Adding the ability to add slightly more semantic information comes at a very low cost in terms of time spent preparing the data.

    Adding the ability of querying that data and linking information together using RDF comes at no extra cost in terms of human work once the software to do so is there.

    Marking up all kinds of static information may be interesting, but the promise lies in MT Redland and similar applications that help us take advantage of structure that is already there but not made accessible or interoperable.

    Imagine the untold terabytes of data tied up in databases that are not exported because there wouldn't be a simple way for people to make use of the data in a sensible way. Now connect it together, and the potential value of that data will in many cases skyrocket.

    Wrong queue' for Star Wars fans

    I will never understand what makes someone so excited about a movie that they'll camp out for weeks just to see it on the first day or two. I can hardly be bothered having to buy a ticket in advance even. So it is with a certain amount of glee I read this BBC article: 'Wrong queue' for Star Wars fans

    If I had a signature evil laughter, now would be the time I would use it.

    (Yes, I know I'm being mean)

    April 06, 2005

    IKEA for the lazy (or short of time) offers a personal shopper site for IKEA in the UK.

    Interesting idea, as the one thing that annoy me with IKEA is actually having to go there - even though they do offer delivery, actually shopping there can be so annoying, and their delivery options are very limited.

    These guys will go to IKEA for you, get the goods, deliver them at a convenient time, and can assemble it for you if you pay extra.

    I've been wanting something similar for a long time: A company that I can have stuff I order delivered to, and that will pass it on to me at a schedule that works for me (I.e. precise time slots in the evening or weekends only) for the times I want to order something from some company that insists on daytime delivery sometime during the day and forces me to stay at home.

    Back from hospital

    Nothing serious - just accompanying my fiancee who had a routine operation. But it was an interesting experience. I was actually quite impressed with the quality of care.

    One of the interesting aspects of health care in Britain is that it is quite common to have private health insurance here, despite the fact that the public health care system is one of the best in the world - the care is of a more than high enough standard that the private insurance generally covers only expedited access. That is if you end up on a waiting list you get to go private, otherwise there is little point, except perhaps more single rooms and tastier food.

    And despite the Conservative party and their rather pathetic attempts to use the NHS to drag down the government (not that I have any sympathy for Tony Blair, but I do have sympathy for the NHS staff that has to endure being the target of so much mud slinging) what I saw of Kingston hospital was great (except the way it looks - the architect deserves to be shot), and the quality of care was very good with very attentive nurses and a consultant that didn't in any way seemed rushed when going through the results of the operation with us.

    Considering I'm used to the Norwegian health care system where hardly anyone goes private more or less because it's not generally considered worthwhile due to the high standard of care in the public hospitals (not that people doesn't moan, but I don't take much notice all the while most people including those who can easily afford it rarely bothers with private alternatives), I've been pleasantly surprised by the NHS considering the bad attention they're frequently getting in the media.

    I realise there are problems here and there, but considering the number of patients they're handling that is inevitable. It is rather symptomatic of the British press that they always seize on an opportunity for criticism - baseless or not - but only very rarely write anything positive.

    I'll still consider privat health insurance, but only because it's so cheap (a couple of hundred pounds a year) and I can easily afford it, and apart from the obvious self interest I think the more those who can afford it go private the better - after all it frees up the public system for those who can't afford the alternatives.

    April 05, 2005

    Are you living in a computer simulation?

    From Are You Living in a Computer Simulation?:

    This paper argues that at least one of the following propositions is true: (1) the human species is very likely to go extinct before reaching a “posthuman” stage; (2) any posthuman civilization is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof); (3) we are almost certainly living in a computer simulation. It follows that the belief that there is a significant chance that we will one day become posthumans who run ancestor-simulations is false, unless we are currently living in a simulation. A number of other consequences of this result are also discussed.

    A fascinating read.

    (And no, despite the hype, the Matrix was by far the first treatment of this kind of subject in sci-fi)

    Tag ontology design

    Richard Newman has a writeup on his Tag ontology design including N3 notation for the draft ontology itself. The purpose is to create a generic way to associate tags with content with richer context than what is used for instance by Technorati.

    One of the advantages of Richard's design is that by using RDF tags can be distributed separately from the content itself - for instance tags occuring in an RSS feed (for instance from Technorati or ) and in your blog could assert that a certain post is tagged with a certain tag in exactly the same way, or third parties could publish collections of tags from their own categorisation, again in the same format.

    More over at Richards blog

    My evening of theatre

    We had a nice evening out. Started out with dinner at my favourite Chinese place - New World. The best dim sum restaurant I've come across in London.

    Then we went to see the play - Hecuba, a greek Tragedy. In fact my first greek tragedy. It was a quite enjoyable play, thought I found parts of it quite emotionally "flat". The setting was very simple but effective - just two larger patterned half circle walls that were rotated and moved to change the mood. It was an interesting approach and really focused the attention on the actors.

    Maybe I'll write a proper review when I'm more awake.. Time to go to bed - almost 2am here...

    April 04, 2005

    Error reporting during parsing

    While working on my assembler/vm for parsing one of the main problems I'd overlooked (which is silly, considering how many parsers I've written to handle this problem) was error reporting. As it turns out, this is a fairly straight forward problem.

    I've written parser generators in the past that generated a high level language parser from BNF with some extra annotation (my first attempt was written in AmigaE about 10-12 years ago). The earliest approach I used was a way of marking where to stop backtracking.

    A recursive descent parser with limited lookahead have well defined points where you will have an error condition: anywhere where you exceed the lookahead, backtracking past the lookahead threshold means you've encountered a parsing error.

    However one of the things I don't want to do is limiting a user to a specific amount of lookahead. At the same time I want it to be easy to limit it.

    The solution is to introduce an operator that marks a "point of no return": Once reached, attempts to backtrack past it will trigger an error.

    Incidentally, this is an idea I got from Prolog. Prolog is a logic programming language that depends on evaluating rules, possibly a significant tree of possibilities, and backtrack if a path fails. To cut down processing, and direct the search for a solution, it has something called the "cut operator". The cut operator "prunes" the search tree by marking a similar type of point of no return. The cut operator tells Prolog that if it has reached that point, it shouldn't consider further rules.

    A recursive descent parser is a, usually hard coded, decision tree, and using a cut operator saves you from evaluating all possible alternatives when you know they can't possibly work. A typical example is a parser that accepts only numbers and alphabetic names - once you've matched the first character you KNOW the other rule must fail, so it makes no sense to allow the parser to try to evaluate it. In that simple case it doesn't make much difference, but if you have hundreds of rules, it can really make a significant difference.

    More than that, however, it makes it easy to flag what went wrong - by attaching an error code or a string to my cut operator, I have a simple mechanism for flagging a human readable error.

    I'm letting the VM track column and line numbers, and I am considering also setting a default error string for each label jumped to, so that it becomes simple to automatically generate reasonable standard error messages whenever I insert my cut operator.

    So, is my number of operators blowing up? Not really. Over the original instruction set I've added a NOP (no operation) operator, a BLT (Branch on Less Than) and BGT (Branch on Greater Than) operators. I've also added a range comparison operator, but I'm of two minds about it - it can easily be implemented in the assembler only and be made to emit multiple opcodes. That's the approach I've taken with quite a few other operators I've added to the assembler - including "KLN" for "kleene star" (matching a production 0 or more times).

    For the most part I think I will try to keep the VM as minimal as possible - at least until I have some profiling data for some test parsers to use for determining what operators truly deserve to be optimized.

    I spent a sizable chunk of the weekend (when I should have been reading for my exam) cleaning up the code, and it's getting time to post it, I think, even though the opcodes and assembler syntax will remain unstable for some time. Stay tuned!

    A definite lack of culture

    Going to see the play Hecuba today. The last time I went to the theater must have been at least five years ago. It's stranger that after moving to London with all the theatres and galleries almost at my doorsteps (particularly the first couple of years I lived in London, when I was living at Marble Arch) I've used practically none of the opportunities.

    April 03, 2005

    Exam time yet again

    When I quit uni ten years ago for my first startup, I thought I'd had my last exam... But now exam time is rapidly approaching again. Only a single course this time around, and an easy one at that, but I still need to read. And a week afterwards it's at it again with the two last modules of my MSc before my dissertation.

    I started studying again part time in May of 2003, and so far I've found it quite enjoyable. True, I'm hopeless when it comes to delaying everything until right before I need to hand it in, and it's a lot of work, but it's interesting as well - at least in the case of the course I've chosen in order to actually learn something new, as opposed to the few I've picked just to take advantage of my work experience.

    In fact I've more or less decided to go on studying once my MSc. is done. I've not quite decided whether to go for ran MBA right away, or taking a BSc in finance or economics first. But in any case I'm likely to end up studying for the foreseeable future. Not so much because I actually see an immediate need for it careerwise - I've got enough experience that I've never had to defend not having a degree before, and once I get the MSc that will put the chance of that to rest effectively anyway. It's more a matter of expanding my knowledge of areas related to what I currently work with.

    For the last few years, my main focus have been billing systems and e-commerce, and there are large gaps immediately outside my field of responsibility, for instance when it comes to accounting and reconsiliation, and business forecasting. While I certainly don't want too much in depth knowledge about it, some courses would help me understand the concerns of the people I deal with on a daily basis better.

    Another option I've been thinking about is taking either an LLB (British law) or part of one, in order to have a significantly stronger foundation with regards to the legal aspects of what I currently do. Would beat having to run stuff past legal all the time :)

    But for now, it's exam time again, and I'm mostly writing this just to have an excuse not to read the most boring textbook ever. It's actually not that bad, except that it's about e-commerce and distributed computing, and from what I've seen I know many of the areas discussed better than the textbook author, so it's mindnumbingly boring (and quite annoying whenever I have to read through stuff that I know is inaccurate).

    An intuitive explanation of Bayesian reasoning

    As a follow up to the entry on the PHP Naive Bayes package, here is a page I found that explains Bayes theorem with a lot of detailed examples: An Intuitive Explanation of Bayesian Reasoning:

    Your friends and colleagues are talking about something called "Bayes' Theorem" or "Bayes' Rule", or something called Bayesian reasoning. They sound really enthusiastic about it, too, so you google and find a webpage about Bayes' Theorem and...

    It's this equation. That's all. Just one equation. The page you found gives a definition of it, but it doesn't say what it is, or why it's useful, or why your friends would be interested in it. It looks like this random statistics thing.

    So you came here. Maybe you don't understand what the equation says. Maybe you understand it in theory, but every time you try to apply it in practice you get mixed up trying to remember the difference between p(a|x) and p(x|a), and whether p(a)*p(x|a) belongs in the numerator or the denominator. Maybe you see the theorem, and you understand the theorem, and you can use the theorem, but you can't understand why your friends and/or research colleagues seem to think it's the secret of the universe. Maybe your friends are all wearing Bayes' Theorem T-shirts, and you're feeling left out. Maybe you're a girl looking for a boyfriend, but the boy you're interested in refuses to date anyone who "isn't Bayesian". What matters is that Bayes is cool, and if you don't know Bayes, you aren't cool.

    Why does a mathematical concept generate this strange enthusiasm in its students? What is the so-called Bayesian Revolution now sweeping through the sciences, which claims to subsume even the experimental method itself as a special case? What is the secret that the adherents of Bayes know? What is the light that they have seen?

    Soon you will know. Soon you will be one of us.

    April 02, 2005

    Programming and inspiration

    Sometimes I get inspired, and can write thousands of lines of a program without even a design upfront just based on an idea, being limited only by how fast I can type. Like when I was playing with my parser assembler language earlier this week (my remaining problem is what to do about error handling). This is not one of those times.

    Other times, I can stare at a screen, and be completely unable to get anywhere, even though I know exactly what is needed. Tonight the cursor is just taunting me. I've been playing with the naive bayesian filter in PHP that I mentioned the other day, and it's simple code - I know exactly what I need to do to achieve what I want, but it just takes forever to write the code.

    I hate days like this...

    Style: SWT port to C++

    I came across Style at Freshmeat:

    You gave a hard cold look at CPLAT, FLTK, Juce, NoWait, Qt, Toad, VCF, WxWidgets and ZooLib and you came back with the idea that C++ Cross Platform UI toolkits might be fine for business applications, but that you would never consider using any of those for end-user applications?

    You thought that all those screen-shots were great, except that they didn't convey the user experience (or the lack thereof), and most importantly, lacked style?

    From your C++ vantage point, you've peered at Java's Swing or even better, at IBM's SWT, and saw what really responsive, and host platform friendly user interfaces should look like?

    If, like me, you have been annoyed by the lack of attention to the little things in UI development, that just drive users crazy, then collaborating to Style may be for you.

    Style is the ongoing C++ port of IBM's Native Look and Feel SWT for Java, itself deriving from IBM's VisualAge for SmallTalk, with two major twists.

    I love the idea. SWT always seemed to me to be a great step forward for Java in that you'd finally be able to drop the butt-ugly Swing look and get something that looks (and behaves) as a native application.

    I'm really happy to see a project to provide something similar to C++. Note, though, that this project so far only have OS/X code - the GTK and Windows ports are only on the planning stage.

    One of the reasons I'm posting this is in the hope that someone will volunteer to help this guy out...

    An alternative view on the pope

    I sickens me to see the attention the pope's health gets. Not least because he himself have shown such blatant disregard for the health of the millions of people he has sentenced to death with his continuing hardline view on condom use.

    Millions of people are dead or dying as a result of HIV and other STD's that would have been healthy had they OR their partners not believed they'd go to hell if they did not listen to him.

    It is easy to try to dismiss this as a case of personal choice, but when someone is in a position where hundreds of millions of people believe this person is their direct link to "God", it is too simplistic to try to ignore the hold that person has on the choices made and pretend this person does not have any kind of responsibility for what he tells people.

    I will not grieve for the pope. I will show the same lack of concern for his life as he has shown for others.

    DIY hell - again...

    But I've finally finished the living room painting and flooring, so at least parts of the house are starting to look decent. Just missing some skirting board, a few book cases and a nice, big, plasma TV and I'll be happy with the living room ;)
    Mostly, anyway.

    SCO: Crash and burn

    Over at GROKLAW
    you'll find PJ doing her great analysis work again - this time on SCO's 10K, which they've finally delivered.

    The overall impression is that it would suck really badly to be a SCO shareholder at this point. Not that it hasn't before. But this 10K is some of the most negative reading I've ever seen.

    Beautiful. Maybe we'll get rid of the vultures soon.

    TSA lied about Secure Flight

    Joi Ito has a very disturbing entry about the US Transportation Security Administration, and their passenger profiling efforts: It's Official: TSA Lied

    What disturbs me the most about these programs is the secrecy and the fact that there's been no form of independent assessment of whether or not they work (and little to no reason to believe they will).

    It also exemplifies a kind of ongoing hysteria over terrorism that is not in any way warranted given the low number of deaths from terrorism (even when including 9/11) compared to things like natural disasters, traffic accidents, cancer etc.

    So why are governments so up in arms about terrorism?

    Because it lets them get away with things they wouldn't otherwise, such as the Afghanistan and Iraq wars, massive breaches of privacy, imprisoning without due process, and more.

    Exaggerating fears to get popular support for oppression has long been a well established method for governments, including a wide range of dictatorships, for restricting liberty and getting a closer grip on their populace.

    Now, lets assume that the current governments of these countries won't abuse these laws, what is to say that we don't see a government that WILL abuse them after the next election? 10 years from now? 20?

    Laws limiting the governments power aren't there primarily to protect against decent people. They're there mainly to protect us in case someone in power turns out to be prepared to step over the line.

    In that respect, it's worth remembering that Hitler got to power via elections. Lots of manipulation, coercion and scare tactics, yes, but he is proof that a democracy is not immune from rapidly descending into dictatorship if there aren't sufficient checks and balances.

    You don't need someone like Hitler in power to do significant damage. Someone a fraction of as evil or just plain misguided will do just fine in ruining innocent peoples or drastically reduce liberties that actually affect people.

    I think the most important demonstration of how disingenious the current focus on terrorism is can be found in the UK: For decades the UK faced IRA terrorism. I lived here when the two last bomb blasts in London happened. Curbing civil liberties to fight it was never a serious likelihood. Not even after the Brighton bombing that worst case could have taken out most of the cabinet, including then prime minister Thatcher.

    Instead now, after there haven't been any bombs going off in Britain for years, and only very limited terror threats in Northern Ireland, now suddenly the government needs power to intern and restrict suspects they don't even think they'd be able to convict for lack of evidence.

    What's wrong with that picture?

    Is SAJAX needed?

    Sam Ruby says Sajax is still unsafe

    While I haven't looked at the safety aspects of it, I have to ask if it's needed at all?

    When Gmail launched I'd already been interested in this approach for a while, after seeing stuff like Netwindows.

    It took me - with fairly basic Javascript skills - just a couple of hours to put together a fairly advanced test application.

    A small javascript function library and you have a fairly solid base on the client side.

    On the server side I hardly saw much need to add any special code - it was no different than writing any other PHP code in my case, with proper request handling, except that I was returning XML instead of a complete webpage.

    For me, the main revelation was that I could make truly clean web apps by cleanly separating visal presentation and logic by wrapping all the latter, and providing the former as a CSS stylesheet and a set of XSL transformation that I could easily apply server side for "old style" clients.

    Combined with href's to PHP scripts to fall back on for actions that supported browsers with javascript on would do client side it's actually quite easy to write these kind of apps in a way that allows them to work on any browser.

    But that's a digression. What I want to point out is that Sajax isn't really solving much for me - the disadvantage of being pushed into a certain processing model outweighs the trivial advantages over what I get from the 50-100 lines of javascript I cobbled together for my tests last year together with "business as usual" on the server side.

    The ownership of data

    Ian Forrester over at has written about data ownership and it's relationship to the vs debate.

    While I must confess to be amongst the heathens who actually haven't used either yet (though I've had a quick peek), I have strong feelings about data ownership, and that is perhaps one of the main reasons I never used the first time I looked:

    The access to my data - not just via basic interfaces, but in forms I can easily download in batch and manipulate or move to a competitor is a key thing for me.

    It's one of the things that make me keep my main mail account locally instead of with a free mail service.

    It's one of the things that makes me avoid applications that use closed file formats like the plague.

    It's one of the things that makes me willing to pay for a service I could otherwise get for free

    Data ownership is going to become a competitive issue as more people start running into the consequences of not having control of data that they depend on.

    It's not just about a company going out of business, but conflicts (do you disagree about payments? price levels?), ability to move to a product with better features (if you're so sure you have the best offer for me, why do you try to make migration hard?) and the ability to reuse MY data in ways not supported by the platforms hosting it.

    I'd happily pay extra for a solution where I have full access to data. And while I believe "normal" users probably doesn't care all that much, I believe there is a significant market opportunity for companies to provide this kind of service, as it's a potential for significantly higher margins.

    And the funny thing is, while I want to know I have the ability to move easily, and while I would use the chance to back my data up on a system I have control over, I am an extremely loyal customer. I'm not price sensitive, and I HATE the hassle of moving services, and the ONLY thing I expect from a service provider that provides a decent quality service in order for me to stay is transparency - when something goes wrong I want them to tell me, and be honest about it.

    That's the main reason I've stuck with my current broadband provider, for instance (if you're in the UK, check out - their customer service has always been outstanding, and their support system is the best I've seen. They seem to be quite good value but thanks to their customer service, I haven't even been bothered to look at other prices for the last 2-3 years.

    It boils down to trust. I trust I'll trust an online service that give me full access to the data they hold for me in a format I can actually use.

    They can feel free to charge me for that - it's worth it.

    Another 24 hours of ignoring the news.

    I just find April Fools annoying. It used to have some element of fun as a child, and when you'd only see a couple of dubious news entries all day. But these days you can't trust anything on April Fools, so what's the point of reading news at all today?

blog comments powered by Disqus