Welcome! This is an ARCHIVED page from my old blog

In addition to taking a look at the entry below, why don't you also take a look at some other recent entries:



If you like what you see, please also sign up to the RSS feed

2005-03-31 21:40 UTC La Vida Robot

Main | April 2005 »

La Vida Robot

Wired 13.04: La Vida Robot:

How four underdogs from the mean streets of Phoenix took on the best from M.I.T. in the national underwater bot championship.

A great read, whether you like robots or not..

C++ and reuse

Over at the Manageability blog there's an interesting entry titled Manageability - Google's Coding Culture and C addressing the lack of uniformity in C++.

There's a few points I'd like to discuss in that regard.

*update*: I also posted a rather lengthy comment to the entry linked above...

One of the main things separating C++ from many other languages, whether it is Java with a large "standard" body of code, or Perl with CPAN etc., is the method of disseminating new code.

In the C++ world, the rule is very much a "bazaar" style of development of libraries - everyone are welcome to the party, and people are hawking their wares at every street corner and sometimes in the middle of the street.

This has it's advantages and disadvantages. Among the disadvantages are the issues mentioned in the entry above: There are many things for which there are no standard ways of doing things. There are often a multitude of libraries doing the "same" thing. There are many coding styles.

However, what that fails to recognise is that the situation is like that to a large extent because that's what people want. Not that people want chaos, but people want different things.

Many of us that dislike Java does so exactly because it forces or at least nudges us into patterns we don't want to follow, and because it constrains us to programming models we don't like.

The multitude of C++ libraries doing the same thing comes from a variety of reasons:
- The standard is limited. It is limited because every standards conformat compiler is expected to offer everything in it. That is, sockets support is considered inappropriate because not all systems can support it, and so on. This is a very different approach to Java.
- People don't know about each others efforts, but decide to keep going when they find out because of different goals/features/needs.
- Different goals, features and needs is a big driver: There are trade offs in anything you do, and where the Java approach is to put one approach into the standard, the C++ approach is that if there is no clear consensus on the way to do things it doesn't belong in the standard.

As it is, that leaves C++ with a very powerful but also very limited foundation, with the STL (which IS part of the standard), iostreams and the remaining bits and pieces forming a generic base, and above that you are free to pick and choose.

But look aroud at various open source projects, and you will quickly see some sets of "standard" libraries popping up everywhere.

The C++ world is fragmented, but there is extensive reuse.

While it's easy to say C++ would be better off with a larger standard library, I think that is a two edged sword - many of the people using C and C++ do so because they only "pay for what they use": You don't get a whole lot of stuff with a basic setup that may not be appropriate for your system.

A lot of these people would not use C++ if it grew into the huge standard that Java has become, because it would no longer fit what they are looking for.

The very strength of the C/C++ legacy is the huge amount of code that is out there in forms that are reasonably easily reusable - it just isn't as neatly packages as some for some other languages.

Reuse in C/C++ is just a whole lot more focused around finding code that works for you (has the right space/time trade-off's etc.) and is under the right license.

Part of this is also a result of the pure age of the C/C++ community - a lot of code back to the 80's, and code with roots further back, is still not only in use but also being reused in new systems.

(One classic example is wildmat.c written by Rich Salz in 1986 and still being reused mostly unchanged in new code these days)

BlogPulse Conversation Tracker

Via Mike Liksvayer: BlogPulse Conversation Tracker is a specialized search that attempts to build a view of the "conversation" that is created by people commenting on a blog entry around the web.

This is the kind of application that would be so much easier with widespread use of the previously mentioned "Thread Description Language, by explicitly annotating the pages to describe the relationship of the posts and comments to posts.

Instead of having to search, it becomes a simple matter of traversing the links in the documents. A firefox extension that present a three view based on TDL annotation would be great... Unless someone else gets to it first perhaps it's time to spend some time experimenting (please, let someone else get to it first, I'm spreading myself way to thin these days... :) )

Greasemonkey as a lightweight intermediary

Via Intertwingly:

Jon Udell wrote an entry called The architecture of intermediation on how he as a user would like a way of adding features to a web application.

Simon Willison followed up with this article on using Greasemonkey as a lightweight intermediary

He's using Greasemonkey as a tool for annotating webpages and storing the data on a central system.

I've been thinking about something similar for a while: I read/write some German, French, and bits and pieces of Dutch, Italian and Vietnamese. The problem is that I have far too little time to spend on studying these languages, and it's far too tedious to read extensive texts in them.

Looking up words in a dictionary all the time is too tedious too. What I'd like is a way for me to a) annotate a page with notes on words/grammar etc - Simon's article has lots of great pointers, and b) automatically pull down a list of dictionary definitions for words I haven't indicated I know well enough, giving me easy access to definitions.

I think a tool like that would let me spend a lot more time reading these languages...

March 30, 2005

PHP Naive Bayesian Filter

Thanks to Bitflux Blog for this link to a PHP bayesian filter.

The linked page also points to James Seng's plugin for Movable Type to do Bayesian filtering of comments.

In case you don't read French, I've done a quick (and rough, my French is bad - I need to use it more) translation (feel free to correct me in the comments...):

This is about filtering comments, pingbacks or other trackbacks to your site. I don't play much with that, but the idea of a filter based on the Bayes theorem intrigued me too much to resist doing a PHP implementation.

Simple and efficient

The Bayes theorem is a simple relationship between probabilities. For example if you have a document and two categories spam and nospam, it is difficult to learn the probability that the document belongs in one category or another directly. On the other hand it is simple to learn them by analysing each word of the document.

For the theory, a simple search on Google for "naive bayes theorem" give you numerous references. And if English doesn't stop you, you ought to read
Machine Learning in Automated Text Categorization by Fabrizio Sebastiani. If you prefer Perl to PHP, look at the CPAN modules of Ken Williams like Algorithm::NaiveBayes.

The interest in the naive Bayes algorithm is because it is fast and globally useful. You could for example utilise it for the classification of comments on your site. For example, see the filter for MT that motivated me for making it all in PHP.

Utilisation in practice

In the archive you find a script which allow you to train your database and make a query. It is meant for implementation in a larger system like your blog system.

At first, use the file "mysql.sql" to initialise the database. You should afterwards use the script to create at least two categories, for example "spam" and "nonspam". Afterwards you must train the filter a bit before testing it.

Important functions:

1. train() : To train the filter
2. untrain(): To untrain the filter
3. categorize() : To classify a document
4. updateProbabilities() : To update the probabilities in the database after a series of train() or untrain().

The use of categorize() does not add any information to the database. It only returns the result of the probability calculation.

Update: Replaced the Machine Learning URL with a working one provided by Audun. Thanks!

Floarian Mueller to give up software patent fight

From ZDNet UK: Anti-patent campaigner hangs up his gloves

Florian Mueller has been a tireless campaigner over the last year, and together with the people at FFII he's managed to create enough difficulties for the pro-patent lobby that the directive would have been dead if the EU Presidency had stuck to the EU Council's procedures instead of going to extreme lenghts to quelsh any democratic legitimacy the council had.


We all owe him a great deal of thanks for the work he's put in, and I hope his game project is a great success.

In the meantime, we still have the FFII, it's still possible to kill this directive, or even fix it.

The Temporal content of Web pages

OWL-Time
is an OWL ontology for describing temporal aspects of web pages or web-services.

One very useful aspect of it is that it's fairly readable and well documented, and comes with several example files - as such it's a great way of getting more familiar with OWL.

Bootstrapping assemblers/compilers

I know perfectly well what to do and not to do, yet I keep getting bitten by this anyway:

Do NOT change the language without first taking a copy of a working environment... NO, checking everything in to CVS/Subversion is NOT sufficient - you need to verify you have a working copy..

There is nothing worse than discovering you've just made a change to your compiler/assembler/parser/whatever, only to try to rebuild it with itself and discovering that the change you thought was entirely benevolent in fact broke the damn thing, and be stuck with something written in a language that doesn't exist anymore (since you just modified and broke the only translation tool).

Luckily, when I did this last night it wasn't too bad - I just had to rewrite about 20 lines of my assembler and dig out an older version of the parser to be able to rebuild it again, then change the lines back, correct a bug and rebuild it once more (after taking copies this time)...

But why will I never learn this lesson once and for all? I've written at least a dozen translators that have been bootstrapped to use code written using itself before, and it seems that every single time I forget about this sooner or later.

Threaded Description Language

Thread Description Language (TDL)


TDL is an RDF vocabulary for desribing threaded discussions, such as Usenet, weblogs, bulletin boards, and e-mail conversations.

So what could it be used for? One obvious thing would be to enable client software to access web based message archives without having to care about scraping the HTML to see if all the header information is there - just embed the RSS in a page,

Another would be as a uniform way of storing meta data about messages and their relationships, or exchanging that information with other applications.

Mozilla AOM

Ever wanted to program XUL applications or extend Mozilla? Mozilla Object Reference covers the Mozilla Application Object Model with lots of reference material and tutorials.

(AND it contains an RDF model categorising all the data and a sidebar extension for Mozilla to browse it - a nice demo of some of the stuff you can use RDF for.)

Miss having a cat

There is something extremely compelling about Wil Wheaton's writing. Even when writing about how he has to let go
to save it from further suffering, he manages to do it in an upbeat way by telling the story of how they found him in the first place that immediately made me miss having a cat. Even if the bad thing about having an animal like a cat is that you inevitably outlive it and have to let go.

The last cats we had was when I was still living with my parents.

We got this grey, violent, extremely agressive furball of a female that my parents got talked into it by the previous owners who had to get rid of her due to an allergy.

She was the most brutal cat I've ever known, frequently hiding in the berry bushes waiting for birds. Not sparrows, or similar tiny things, but magpie's, crow's etc. Once we found 3-4 of them under a bush - she never bothered eating them.

Another time a magpie was teasing her on the lawn, seemingly at safe distance. It reached maybe a meter up into the air before our cat was sitting on it's back and forcing it to the ground. Then she let go and waited for it to take off again before jumping on top of it yet again.

After we'd had her for a while, we ended up with another litter of kittens. We kept one of them, a tiny little male that was entirely black.

He was shy and passive from the start, and as he was growing up his mom was quite a bit too watchful - she used to hide behind the curtains and hit him with her paw whenever he passed by (he never learned to spot her or avoid her - silly cat).

You couldn't help feeling sorry for him, and I think his demeanor and the way he was treated was what made me so much more attached to him than his mother or the other cats we'd had over the years.

His mother did keep trying to teach him to hunt and kill but he just wouldn't learn. I don't think he ever did - he was more dependent on his humans than any other cat I've known.

One winter his mother didn't return from one of her nightly trips. Probably run over by a car, but we never found out for sure. With any other cat I'd say a fox might have been a possibility but given how ferocious she was, I'd pity the fox that would have tried attacking her.

Sad as it was, after that our black cat "Svarten" ("svart" is black in Norwegian - not very original) started coming into his own. He livened up, though he was still a real coward.

At the time my parents had a bird, and my brother a couple of rabbits. You'd think they'd have a hard time keeping the cat away, but not so - in fact, once one of the rabbits where left out of it's cage alone home with Svarten. We suspect the rabbit got cornered and gave him a real kick or two, because after that he was afraid of rabbits too...

Whenever the bird was let out, he was even stranger, refusing to even look in it's direction. If you tried getting him to, he'd turn his head away. Fighting temptation perhaps...

A few years after he was born, my parents for some reason decided to get a dog. I still don't understand why, considering the number of animals they had.

I was so angry when I found out, because it turned out Svarten was not the kind of cat you try to get to live with a dog - he promptly moved out and in underneath the house, and stayed there for many years, only coming in for visits when I was around and the dog out of sight - always carefully looking around to see if he was safe.

He was extremely affectionate those times whenever he decided it was safe enough - insisting on curling up on my chest purring when I was going to bed.

A couple of years after I moved out, he finally got up enough courage to move back in, but by then his best years were over. He soon after started developing liver problems, and one weekend I went back to visit he was gone. They'd forgotten to tell me they'd had to let go...

March 29, 2005

Parser assembler update

An update on my assembly language for parsing. I now have a parser for the assembly language written using the bytecode it will generate that will parse the full syntax. Total bytecode size?

About 400 bytes.

(*UPDATE*: Ok, so it ended up at 507 bytes. Still pretty good and it will be smaller once I've added some of the enhancements to the instruction set. Though I'll have to add some error reporting - that will probably bring it up to around 1K)

Apart from the additional features I've mentioned earlier I will probably need some additional error handling functionality as well, in order to make it easy to give proper error messages.


South Korea to promote Linux use

CNET News.com reports that South Korea is to promote Linux use.

It makes sense for governments to be pro Open Source. Regardless of the cost issue, open source has the advantage that it guarantees open access - in a democratic society creating barriers to participation is a significant issue.

Locking people into undocumented data formats threatens participation particularly in poor countries, but does also cause a significant archival problem once the vendor withdraws support.

However it also provides an important possibility to grow the local service and development industry.

Even if Microsoft would turn out to be right, and OSS turned out to be more expensive than their software, that wouldn't change that the alternative is to funnel money into the coffers of a US company or to funnel money into the wallets of local software engineers and IT consultants who in turn will pay a significant chunk back again in tax, and use a significant chunk of the rest to purchase products in the local economy.

These are the two reasons I think should be the most important for governments looking at open source - the cost of Microsoft software is much larger in terms of reduced opportunities and investments in the local economy than it is in direct license and maintenance costs.

Bloody daylight savings

There's few things that irritate me more than forgetting daylight savings. Luckily we had the bank holiday monday due to Easter this year. However there is one thing that is more annoying: Why the .... does Europe and the US need to switch one week apart?

I always forget about it when booking meetings, and inevitably end up stuck in the office an hour later than planned due to unavoidable conference calls with people in California.

The universe is out to get me.

Making the web a long tail broadcast medium

Anybody remember Pointcast? Back in the dot-com boom years, when push technologies was going to be the Next Big Thing, Pointcast was IT. "Everybody" were clamouring for a piece of the push action. Then nothing happened.

So where are we now? Push is finally maturing - conceptually - though data is really being pulled.

RSS has been a driving factor in making us reactive instead of pro-active when it comes to a larger and larger segment of our interactions with websites.

Push was "going to be big" back in the late 90's because it would let people broadcast to an audience, just like in traditional media. And that is what is finally happening.

As I'm watching the stats for my RSS feed, I can see instant feedback whenever I'm active in the form of more readers, exactly because software now works the way push was meant to.

While technically our readers are pulling the data, it's conceptually push - I put an item out there, readers pick it up and feed it to an aggregated view.

The conceptual difference is that readers don't adapt to specific patterns when they read your site, but they read when new material becomes available. This is also the key differentiating factor between books and newspapers, which you read when you have time (though the timeliness issue of newspapers makes the time you consider it interesting limited) and TV/Radio where you tune in when content becomes available.

Yes, you can come back and see it later, much as you can timeshift broadcast. But more and more content consumption is controlled by availability rather than a well defined time when we log on to check a few sites. We're "tuning in" to content rather than a specific source.

So while there are still obviously a lot of websites out there that are interactive, or where actions are user initiated, sites where timeliness is an issue, or where there is a demand for quick access to updated content, we're turning into information consumers in much the same fashion as we are with mass media.

Rather than seek and and research, we're often content to sit back and deal with the information thrown at us.

That brings up the obvious question: Who will find the best way of building and monetizing these audiences? There's clearly already a significant revenue potential in "normal" advertizing, but broadcasting, especially in the form seen with blog's which are more like a talk show than recorded programming, has the advantage that it builds loyalty, and where a personality that builds a large audience has the potential to extract far greater value than basic advertising.

Case in point: Oprah Winfrey. Get a mention for your book on her show, and you're rich. Get it into her book club, and you're even richer because millions of people are members. Both because she's a trusted personality.

Could she have gotten that position in a medium where updates can happen at any time, but you only find out the next time you feel like checking? No. Newspapers online have worked without push because of regularity - they spent fortunes on building brand or have offline editions that does it for them to build an audience, and the audience keeps coming back because they know there will be regular updates.

And while RSS increases the timeliness for such news sources as well, by shortening the time before they have their headline in front of a user, RSS is the poor mans broadcasting - it brings the same timeliness to a tiny blog as a major news source, both significantly better than pre-RSS timeliness for most sites.

The outcome is a levelling of the playing field that makes it significantly easier to create the diverse niche driven push market that push providers like Pointcast was hoping to drive them revenue.

The reason it worked this time around? Decentralisation. Anyone can publish, so the amount of content have exploded.

While this poses a challenge to monetisation, it also creates tremendous possibilities for two groups of people:

Aggregators that can finds ways of sift through all this information that add value, and publishers that can find ways of creating compelling content.

The former because they get to be a "radio channel" with only licensed content - they have much more freedom in creating a line up than a traditional broadcaster which has to take chances on who will produce quality content. They can see what seems to take off, and create premium feeds for content that is valuable enough.

Publishers of content because they're no longer limited to finding someone to take their content - they can put it out their and use it to build an audience.

However it boils down to the long tail: You're suddenly targetting niche markets.

Newspapers targets niche markets. It's just that it's targetting many of them at once. I read a newspaper mostly for the main news headlines, political commentary and technology. I couldn't care less about sports, celebrities, TV programmes, horoscopes, classifieds etc. I don't find technology a compelling reason to buy a paper anymore. Nor political commentary. Nor headlines. I can get all of them from disparate feeds - my news headlines mostly from the BBC. My political commentary from the Guardian and assorted blogs. My tech news from a long list of blogs.

A few people will be able to make significantly profitable blogs or RSS driven "channels" of articles with mass market appeal. But the real money is going to be in figuring out to deal with that long tail - then huge amount of smaller blogs that will never make much money individually, but that are all interesting to someone

Ad networks may be one way to squeeze some returns out of it. But I have a feeling that the real place to be is as a successfull aggregator: Finding the right balance in how to provide the right news source to the right people, and how to combine that with personalised offers that fit the content, and at the same time building an audience that trust you the aggregator implicitly because of your pick of quality sources.

In many ways, the two roles - aggregator and publisher - might merge because one way of building that audience is mixing the aggregation of content with unique content that add value to the aggregated content. A significant number of blog's already do mix these roles, as much of the content is commentary on other content - we're aggregating and commenting the same way a talk show host is.

However, whatever the winning formula: the problem of aggregation must be solved. I'm currently following around a hundred RSS feeds. Maybe 10% of the entries are actually of interest to me, and I suspect the ration will get worse.

Squashed philosophers

Ever wanted to impress friends with your knowledge of philosophy, or just wondered what they were all about, but don't have time (or the interest) to read the full original works and try to understand them?

Squashed Philosophers is a site that provides you with a timeline of important works in Western thinking, with links to abridged versions, complete with summaries, a very reduced version and a somewhat longer one, including reading time estimates.

Note that many of the philosophers are represented with works that may be less known among people without much exposure to them. For example, Marx and Engels is represented with the early work "The German Ideology", not with more well known later work such as the Communist Manifesto.

This selection of lesser known work from some of the represented people is perhaps a good move, as the well known works are also the ones you're most likely to know most about (or even have read).

Exploring the Semantic Web: MeNow and MusicBrainz

crsmith.net has an interesting entry on using RDF data from MusicBrainz to export information about the music tracks he's currently listening to, and how it'll allow him to link that information to, for instance, license data, review information, FOAF data etc. without having to explicitly combine the data sets: MeNow and MusicBrainz

March 28, 2005

1,250 moments of failure

Ghost Sites has a gallery with screenshots of 1,250 web projects that dies between 1998 to 2004 together with 'web elegies' - annotations on the context of the failures.

See Ghost Sites: The Museum of E-Failure

It is hoped that this exhibit - a sample of cultural product created by the dotcom era's lost wunderkind - provides some small iota of insight into the Web's possibly central role in the future history of Dead Media.

Technological singularity

Daniel Lemire has a short entry on technological singularity, or in other word the idea that at some point humanity may develop technology that leads to a phase of rapid development that is so beyond our comprehension that we will be unable to predict even the relatively near future.

Consider hundred years ago. Progress was slow enough that while you couldn't predict most events accurately, you could relatively safely assume that things would be for the most part the same 10-20 years later.

Today, can we? The internet rose to prominence in just a few years. And while cell phones (as the internet) has been around for decades, they to have had transforming effects on society in just a few short years.

Consider a hundred years into the future - can we still assume we'll know for the most part what society will look like 2-3 years ahead? One year?

I find the concept fascinating in part because we know so little about it, or whether it is even a real possibility.

Even if the concept is valid, is the singularity a static point? That is, will we reach a point where humanity is transcended by technology? Or will humanity advance in capabilities sufficiently that the singularity is some sort of ever receding horizon beyond which we can't make predictions with any degree of accuracy?

Daniel mentions he thinks AI is currently out of reach, and it's a view I share. A lot of AI technology such as neural nets are useful, and will continue to improve, but we are still far away from understanding enough to building something intelligent enough to call a real AI.

But he also raises the question if we want to create something more intelligent than ourselves.

To me that question is pointless. The question is "will we?" and the answer is "yes", because as soon as we have the ability anyone not making use of that ability will be left behind, whether it be in terms of defence or in terms of competitive ability in a marketplace.

Another important aspect of the idea of a singularity is that we don't NEED to get to the point where we can directly create something more intelligent than ourselves. We only need to get to the point where we can create a system with self improving intelligence.

If we manage to create ANY form of real intelligence in software, whatever real intelligence is, we already know that genetic programming has the potential to automatically evolve the software. If we manage to create intelligence coupled with a good enough system of increasingly complex competitive pressure, we may at that stage already have created the singularity.

Once that happen, the exponential improvement may happen by itself, in the form of massively accelerated evolution - not directed engineering.

Is it desirable or not? We don't know. There's no way of knowing when/if the singularity happens whatever technological advances will result will be benign or not.

Arrrrgghh..

Off to paint the last bits of the living room... I hate DIY (not as much for the work itself as for losing the time I could otherwise have spent doing something else), so something I think I'm certifiably insane for having bought a house that needs so much work.

More on my assembly language for parsing

As I wrote previously I'm experimenting with an 'assembly language' for parsing.

I'm about halfway through writing the parser for the assembler in itself and temporarily hard wiring it into the virtual machine to bootstrap it.

Lessons learned so far include: I really DO want a couple more high level instructions. I've only add BLT and BGT (Branch on Less Than and Branch on Greater Than) to the instruction set so far to make handling ranges easier, but I realise that I really want to create expanded versions of CMP, TRY and REQ (see the earlier entry for descriptions) that will handle ranges instead of single values, as it will dramatically simplify some rules.

The kleene star and one-or-more instructions I hinted at would also have proved very useful, and will certainly be added. All in all I want to focus on two groups of instructions: High usability low level functions (i.e. they manipulate the state of the vm directly) and 100% "composable" functions - that is, instructions that can be composed entirely of low level functions. In my experience that makes implementation so much easier, and maintaining that separation means that you get good coverage of low level instructions so that any high level functionality you leave out is likely to be composable.

Note that I'm NOT aiming for turing completeness, though it wouldn't be strange if it happens by chance. The goal is a very restricted language only intended for parsing.

I've thought quite a bit about it over the weekend, and in particular whether or not it is likely to be particularly useful, considering the availability of similar tools.

What I've realised, though, is that given the complexity of parsers, it's very rare for parser generators to meet my needs, and it's very rare for me to be happy with generated parsers without manual modification. However modifying lex and yacc parsers manually is something you'd probably not want to try. Modifying an assembly like language, however, is fairly straightforward (that's the old 6510 and M68000 demo programmer in me speaking). I've actually written a whole compiler in M68k assembly before, and so this is bringing back memories - M68k assembly was actually quite well suited at least for the parsing aspect.

The advantage of a heavily restricted VM where programs will mostly use relatively high level constructs is that it will be easy to make reasonably well performing interpreter for it. Given the current size (ca. 300 lines of C++) it is reasonable to assume that a competent programmer could port it to any language in a day or to, and instantly get access to working versions of any grammar translated into the language. This is one of the areas where most parser generators become a burden.

Another thing I want to try is building a relatively sophisticated BNF -> parser "compiler". Most of the constructs I've added are very well suited for translating to from BNF, as I'm used to using BNF as a starting point for all parsers I write. There are quite a few things such a compiler could do very easily with the instruction set I'm working on to generate a much faster parser than what I'd write by hand (because if I write by hand I'd be aiming for simplicity...):

- inlining productions.
- "lifting" shared or "almost shared" initial terms in OR groups. If you write a grammar where a bunch of productions all start with a character, and use them in an or expression like this: (foo | bar | baz) it makes sense for the compiler to generate a single check for a range of characters to allow. This is fairly trivial to add.
- merging of similar subtrees. That is, if I have productions in an or-expression that expect the same initial characters it can often be fairly easy to check for the initial characters separately.

All of this serves to bring the resulting assembly closer to expressing an NFA for the language, but just expressing it in bytecode instead of a table of transitions. The advantage is that it'll be fairly easy to make it possible to turn these optimisations off to get readable assembly to look at to debug the parser.

Another advantage I see from this approach is exactly the debugability - I already have a "tracing mode" for my VM that will output the exact instruction stream as executed, and it makes it so much easier to diagnose problems.

The last advantage I see is size. Expat (a C XML SAX parser) on my system is about 126k. The full VM executable (meaning it's also dragged in a lot of library code) plus most of the assembler, and debug output and without optimization is currently 38K. A C version would probably weigh in at significantly less. Clean it up, and add bytecode for an XML parser - given my experience with it so far I think it would weigh in at about 5-6KB and conversion functions for UTF-8/UTF-16 and latin-1 I think it should be easy to get a full XML parser into less than 40K. Quite possibly less than 30... Looking forward to trying.

Playing with RSS

Been busy all day programming, and testing out RSS enabling assorted stuff, including my e-mail - just love how easy it is to churn out feeds from any information source available and instantly have it accessible from a wide variety of applications, includinf Firefox.

March 27, 2005

Programming Language Texts online

PLT Online is a collection of programming language theory texts and resources, all of which are freely available over the Internet.


Have Mass-Mailed Malware Peaked?

CRN has this article on the six year anniversary of Melissa: Six Years After Melissa, Mass-Mailed Malware Has Peaked

The article doesn't give any reasons for the belief that viruses such as Melissa are past their prime. For one, the article covers spoofed from addresses, but the main reason Melissa and similar viruses were so devastating was exactly that they didn't need any spoofed addresses - their strength was mailing from a user that had you in their address book, and hence using valid from fields wouldn't be a problem.

While most people have hopefully now learned to be careful about attachments etc., the problem with viruses using your address book is that the potential is there to make the virus much more insidious and effective.

For one, there is the potential to not start immediate bombardment of everyone in your address book, but to wait for outgoing messages with attachments and infect the attachments - people are much more likely to trust an attachment attached to what appears to be a fully legitimate message, and they're much less likely to suspect problems if their machine doesn't immediately freeze up due to massive amounts of outgoing mail.

Secondly, your sent mail folder is a trove of information for a virus - there's lots of potential for resending recent messages with attachments adding messages like "Hey, sending you another copy of this as I've made some updates" and similar.

The potential for virus writers is endless - in fact what keeps striking me with each of these virus attacks is how primitive most viruses seems to be. I'd be very surprised if we don't see more massive outbreaks.

I love Wing-Yip

While Croydon is hardly the most glamorous London borough to live in (but hey, Arthur Conan Doyle used to live up the road from me, and D.H. Lawrence used to teach at a school in the road I live in, so at least it was once cultured :) ), it has it advantages, one of which is the Wing Yip 'superstore'.

I love it. Over the last couple of years I've developed an addiction to dim sum, and it's annoying having to go in to central London to get my fix (though now I work just minutes away from New World, one of the best dim sum restaurants in town, provided you can stand the pushy 'service'). That was until we discovered the Wing Yip centre in Croydon only about 20 minutes by bus away.

Isle after isle of dim sum, Chinese snacks, noodles, cakes, exotic (to me anyway) fruits and other fun stuff I'll probably end up tasting even though I don't know what it is (always fun to try to figure it out only from taste...).

We've filled the freezer now, so we'll be steaming dumplings every other day or so for the next few weeks.


Tips for Mastering E-mail Overload

Harvard Business School has a site called "Working Knowledge" and I just happened to come across a link to an article there called The Leadership Workshop: Tips for Mastering E-mail Overload (found it over the blog of Mike Arrington, a great guy I used to work with after I just recently stumbled over his blog)

The article has lots of great advice. For me the e-mail overload has gotten to the point where I hardly answer non-work related e-mails any more (it's easier to get a response from me by comments on here than by e-mail, at least as long a the volume here is so much smaller than my mail volume), which is really bad when I occasionally get mails from old friends etc. that I haven't talked to in ages and I completely forget to mail them back, but the whole combination of work and my MSc. has really killed the concept of having spare time for me.

I do however think a lot about how to improve on e-mail and would really like to get back in the e-mail business - there's still money there, potentially a lot, for whoever comes up with the right productivity enhancing tools.

The very fact that e-mail is such a vital tool makes the value of an application that can save just a few percent extra of your time ridiculously high in a business setting.

At the same time I think pure e-mail is too limited. Testing the water with this blog has convinced me that there is an important space for integrating mail with web based technologies to take more control over how you communicate.

Combining direct communication with feeds of information that is relevant but not personal could do much to reduce the overload that is mainly occuring because what is treated by the reader as a one to one medium (it is likely the message is intended for you) is being treated by the sender as a one to many medium (the message likely has some relevance to all recipients, but is unlikely to need immediate attention from most of them).

While waiting for the technologies to get sorted out, this article provides useful advice for improving the usefullness of the e-mails you do have to deal with.

March 26, 2005

Just finished watching the new Dr Who

I've only seen bits and pieces of the older ones, so I can't speak much for whether or not it matches the old ones in style, but I was pleasantly surprised. Both with the quality of the effects, and the lighthearted approach.

It certainly didn't take itself too seriously, and had a few hilarious moments such as one character being "eaten" by a plastic garbage bin, and Billie Piper failing to realise she was talking to a very obvious plastic copy...

Even my ordinarily rather non-geeky fiancee enjoyed it and wants to watch it regularly...

A proposal to solve the 'Orphan Works' problem

This article at Groklaw looks at one option for solving the orphan works problem:

The Copyright Office has been holding hearings on access to "orphan" works. These aren't movies about kids who have lost their parents -- Little Orphan Annie, say. They are works which are still under copyright but have no copyright holder (or no locatable copyright holder.) It might sound esoteric, maybe even boring, But it isn't. Here's why I think it matters. "Orphan works" probably comprise the majority of the record of 20th century culture, and their orphan status means we have practically no access to them. In all likelihood no copyright owner would show up to object if one digitized an old book, restored an orphan film, or used an obscure musical score. But who can afford to take the risk? The normal response of archivists, libraries, film restorers, artists, scholars, educators, publishers, and others is generally to give up -- it is just not worth the hassle and risk. The result? Needlessly disintegrating films, prohibitive costs for libraries, incomplete and spotted histories, thwarted scholarship, digital libraries put on hold, delays to publication. And all of this waste is entirely unnecessary. Is there any solution? Duke's Center for the Study for the Public Domain has produced a report to the Copyright Office that offers one.

Interesting comments from PJ as usual, and interesting comments, so head over there and take a look.

ACM Queue - On Plug-ins and Extensible Architectures

Via Peter O'Kelly's Reality Check:

ACM Queue - On Plug-ins and Extensible Architectures - an interesting look at plug in architectures, using Eclipse as one example.


An 'assembly' language for parsing

I've mentioned my forays into push parsers previously. But after looking at that approach, I realised I needed a bit more flexibility. So I got the idea of designing a tiny assembly like language for building push parsers with. This is analogous to building an NFA or DFA, but with more operations, and the potential for being much easier to deal with manually.

I ended up with a tiny set of core commands, and I plan to add a few more convenience commands implemented in terms of the core. Here is the core command set I came up with:

RET -- Return with status flag set to true
BEQ -- Branch if status flag == true
BNE -- Branch if status flag != true
STO -- Store token buffer in numbered slot
TRG -- Trigger an event (used to "plug in" native code to build a parse tree)
ERR -- Return with status flag set to true. Unget any characters retrieved in subroutine, and revert token buffer to pre-subroutine state
CLR -- Clear token buffer
JSR -- Jump to subroutine. Push unget buffer and token buffer to stack.
JMP -- Jump
CMP -- Compare to input character, and set status flag accordingly. Yields if no input.
EAT -- "Eat" input character and add to token buffer and unget buffer. Yields if no input.

In addition I have so far implemented the following two compound commands:
TRY char -- CMP ch; BNE n; EAT; n: ...
REQ char -- CMP ch; BEQ n; ERR; n: EAT;

I plan on adding a few more, including a "range compare" and possibly equivalent TRY/REQ variations, as well as a kleene star command, and possibly a "one or more" (i.e. "foo foo*") operator.

As far as I can see it would be trivial to implement a "compiler" to compile BNF into this language, and the VM is less than 300 lines of C++ (including lots of debug output). It would be trivial to JIT or build a code generator spitting out C/C++ or another language as well.

I plan to spend some time writing a basic assembler for it first (tired of adding calls to build it inline without any relocation/label support...). Then we'll see.

So far I like this approach a great deal better than lex/yacc or other compiler construction toolkits I've seen.

Starved for Logic

I've mostly tried not posting anything directly about Schiavo - the closest I've been getting being my comments on a couple of the starvation pieces posted yesterday - because it angers me that people feel they have a right to meddle in what should have been a private matter.

However I would like to draw attention to
this great piece over at Blogcritics highlighting the hypocrisy of complaining about Schiavo starving to death all the while opposing legalised euthansia (as the former is a direct result of the latter being illegal, thus preventing doctors from doing anything but withholding treatment)

However what really got my anger rising was one of the comments posted.

I find in amazing that some people still try to use things like smiles and minor movements as proof that she's not in a PSV.

My question to those people is: Have you ever seen what happens to an Alzheimer patient?

Because long after an Alzheimer patient have stopped recognising you, long after they have stopped being able to talk and move about of their own accord, they will still sometimes smile, move their heads and look like there are glimmers of their old selves in there.

Except by then there isn't enough of the brain left for that to be possible.

Instead, their heads are mostly filled with plaque and mostly dead braincells and they're reduced to the most bare physical reflexes.

What you see is an effect of how the brain works: The more recent and the more controlled by though as opposed to reflexes something is, the earlier it tends to disappear. What you're left with is basic automatic reflexes.

Then motor functions go, and sooner or later an organ fails.

The thing is, a patient can be in a persistent vegetative state for years and give what appears to be responses to someone close to them, while the moment you start to rationally look at what they actually do, you will find no correlation with what happens around them. t.

I've seen it happen to both my grandmothers, which is probably why I react so strongly to the idea of "signs" like that.

One is now dead, the other may live physically for a couple more years, but long before her body gives out she will be reduced to bare physical reflexes as well.

For someone who just sees someone in a persistent vegetative state for a short time, or that wants to believe, it may easily seem like they're sometimes responding. They may smile seemingly at some comment, or chuckle briefly, or widen their eyes when someone enters the room.

But try asking them in any way you can think of to respond, and you will quickly see that the "response" is random.

The only real difference between an Alzheimers patient and what has happened to Schiavo is that Alzheimers is degenerative - you get to watch as part by part of the brain go over a period of many years, so you get used to evaluating what is left of the person you loved. Perhaps that makes it easier to accept.

You may see "signs" if you refuse to believe such a person is gone because if you try hard enough to get a reaction, statistically you will eventually get one, at which point you're happy and stop your experiment.

I'm not a medical expert, and obviously I've not seen Schiavo, so perhaps she isn't in a PSV.

However I would claim that anyone that claims she isn't or can't be based only on what they've seen and heard about her reactions in the media are completely clueless.

It's a 404, loser

404

Do you care about lives or about votes?

First I saw the Blog for America piece that I mentioned earlier tonight, then now I notices Patric Logan has been writing about the same thing:
Making it stick.: Preventing Death

He quotes 13.000 children a day dying of hunger related causes. Add to that another couple of tens of thousands dying of other causes, like easily curable diseases, as well as huge number of adults and it quickly adds up.

Wonder how many lives the members of Congress could have saved if they'd just donated the money their little meddling session cost in transport to get them to and from their little meddling session - surely far more than one.

So the obvious question is whether they care about lives or about votes. I know what I'd place my bet on.

ADTI invents new silly claims about open source

In a completely unsurprising move The Alexis de Tocqueville Institution's Kenneth Brown is trying to discredit OSS again.

After his miserable, failed attempt at discrediting the roots of Linux by claiming it was copied from Minix (a claim disputed even by Minix' author Tanenbaum) Brown is trying a fresh approach:

Brown finds it "intriguing" that many open-source contributors work for large IT companies. "Every day, an untold amount (sic) of employees beholden to strict employee/invention/intellectual property agreements, in their spare time (and even during work-hours) freely give away ideas, code, and products to open source projects," he writes. This opens up questions around the legal ownership of contributions, and could even open an avenue for a "disgruntled employee" to give away company secrets by contributing them to open-source projects, the report argues.

Interestingly, Brown conveniently "forgets" that a lot of these people are paid specifically for the purpose of developing contributions to open source.

He also ignores that this is much more likely to be an issue with proprietary software:

A disgruntled employee that releases code to the public will risk getting discovered immediately, because he/she is spreading the code, well, in public.

A disgruntled employee that goes to a shady competitor may get away with it, and may even get rewarded.

Interestingly enough there have been several lawsuits regarding the latter, but I've yet to see one where the former have been proven.

And the only company currently in active litigation over contributions to OSS, namely SCO, was nearly laughed out of court by the judge, who sardonically pointed out their complete lack of evidence so far. SCO even backtracked on their original copying claims.

Another point that might perhaps be lost on Brown: Most of these strict employee/invetion/intellectual property agreements do not prohibit you from contributing what you do on your own time on your own equipment, and in many jurisdiction any clauses to the contrary aren't even legally binding on an employee even if they sign the agreement.

However, all of this is moot as long as Brown isn't able to show even a single example of what he claims must surely be rampant.

I think Santa Claus robs thousands of old ladies every year, honest, so it must be true even if I don't have a shred of evidence.

In any case, what I found most interesting with this article was the title and subtitle:

Think-tank report lays into Linux Guess what? Organisation is funded by Microsoft

... and the ending paragraph:

Brown's 2004 report alleged that credit for the origin of Linux should go to projects such as Minix, authored by Andrew Tanenbaum. That report drew criticism from many quarters, including Tanenbaum himself. "My conclusion is that Ken Brown doesn't have a clue what he is talking about," Tanenbaum wrote in a web posting at the time.

With a packaging like that, I doubt too many people will take Browns' mindless rants too seriously.

March 25, 2005

Finished my essay!

The hardest part was cutting it down from 6000 to 4000 words, but I did it once I finally managed to stop procrastinating and actually did some work.

So now I'm going to spend the rest of the evening munching cheesecake and writing a parser engine - I got this idea for an assembly like language that would make it trivial to write a push parser based more or less straight from BNF (automated transformation from BNF would be easy too, but I'm not going to deal with that yet). I'll write more about it and perhaps put up some code later this weekend.

Hunger has a cure

Blog for America has a great piece called Hunger is a cure about the fact that 1 in 10 American households experience hunger or the risk of hunger and that more than 9 million children are the recipients of food aid in various forms:

Once again, Americans have lost their focus, and are being distracted by the unfortunate case of one individual, Terri Schiavo. On Monday, March 21, 2005 at 1:11am the President signed a bill in his personal residence, which would allow a federal court to intervene in hopes of replacing her feeding tube. How ironic the president's action was, when there are millions of Americans, including children, who go to bed starving every single night.

It reminds me of a situation in Norway in the early 80's when an aid organization started collecting money for food for USA's starving children, and the US Embassy delivered a formal protest claiming the US could take care of it's own.

So why isn't it happening?

Developed countries (the US is by no means alone in this) still have a far way to go in eradicating poverty and providing proper safety nets.

Support British loos - Join the BTA!

The British Toilet Association - Campaigning for better public toilets for all

I've been looking for an organization to be active in, but perhaps this isn't it... :)

African mobile growth tied to faster development

I've long been aware of the trend towards bypassing landlines in Africa, in part due to the cost structures - while not apparent in developed countries where the vast cost of building out landline networks have been absorbed over a century of to a large extent heavily regulated telephone companies.

Mobile networks are today cheaper to build out, and as a result they've already far bypassed landline usage in many developing countries.

This article over at BBC News isn't new, but it had escaped me until today. It covers the growth levels, and also interestingly shows that growth in mobile usage can also be linked to increases in GDP in Africa.

Now, without seeing the study it's hard to judge whether the numbers say that a country with higher growth is likely to see higher mobile usage, or the other way around.

On the other hand, it's quite logical that as mobile usage grows, it fuels more growth in the economy, thanks to improved communication and the ability to conduct business more effectively.

US to sell arms to yet another military dictator

One might perhaps think that while in the middle of the quagmires of violence that are Afghanistan and Iraq under the guise of "promoting democracy", the US government would at least pretend they actually mean what they say, and not offer F-16s to Pakistan in a move that risk destabilising a highly volatile region (India has already started making objections), a country still dealing with the aftermath of Musharraf's coup and is still suffering from massive civil rights abuses and where the military still holds a significant amount of power over civil life.

But I guess one shouldn't expect anything else, seeing as the practice of providing arms support to people such as Saddam Hussein has a long history with the US government.

Update: Just a quick note before I get any comments above it: Yes I am aware Musharraf held a referendum and election a couple of years back, no I'm not impressed. The election was marred by lots of irregularities, and while I'm sure he does he have some level of public support, to me the fact that he is in his current position thanks to a coup and that he's showed repeated unwillingness of relinquishing his power makes the elections moot - he belongs in jail, not in office. Anything else is an affront to democratic principles.

Grafitti artists 'exhibits' in New York's most famous museums

Grafitti artist 'Banky' managed to sneak his works into several of New York's most famous museums, claiming he could do just as well as the artists exhibited: Wooster Collective on Banksy's stunt (with photos of the works).

I love stuff like this. Not only is it fun when stuffy institutions are stirred up, but it's great fun when it's done in a constructive way.

The most memorable quote, however, is this single sign of modesty (from this article on Banksy's interview w/NY Times):

I wanted to do the Guggenheim but there weren't enough paintings in it, I would have had to appear between two Picasso's and I'm not good enough to get away with that.

But he could get away with putting a painting of an Admiral with a spraycan and anti-war slogans in the background up for several days before being discovered...

Maybe he does have a point about the selection processes for these museums.

Done writing - now for the deletions and references

Finished writing the bulk of the text for my essay... Now I need to trim it down from 6000 words to the 4000 word limit. Somehow I think my 10.000-15.000 word dissertation next year is going to be rather straightforward. I hate having to cut, but I hate writing with the word limit in mind even more, so I tend to write far to much to make sure I cover everything I want, and then go through and try to pick the least important stuff to delete afterwards instead.

Then there's only the references left. At least I'll probably get it in tonight, so that I can enjoy the rest of the easter weekend and focus on writing some cool programs instead.

On a related note, I've just signed up for the last two course before my dissertation. I ended up with a course on computer architectures which should be a breeze (I learned assembly first time 17-18 years ago I think) and one on information security which could actually be quite interesting.

My next question now is what I'll do once I've finished my MSc. I've been toying with the idea of an MBA, or alternatively taking some other courses first - possibly a BSc. in Economics together with some modules on British law. I've gotten addicted to studying again ;)

Easter means back to my essay...

Here in the UK we're off Friday and Monday thanks to Easter, so it means I have all of today and tomorrow to finish my essay on the Semantic Web (it's due by midnight Saturday). Hopefully I'll wrap it up tonight. I'd have finished it last weekend if it wasn't for the fact that I'm so easy to side track whenever I start looking at cool new technology - I have a half finished N3 parser and a half finished C++ DOM implementation to show for it...

If I finish my essay today, though, I should probably finish painting the living room before I let myself tempt to continue work on them, or my significant other might get a tad annoyed.

We've had the house for 7 months now, and we're still not done with all the redecorating - once I've completed painting and laying the flooring in the living room we still have two bedrooms and the kitchen to go.

It really annoys me that the redecorating is keeping me from programming, but then the amount of money we saved by buying a house in need of some work is ridiculous.


March 24, 2005

Lilina News Aggregator

Lilina news aggregator is a browser based RSS/Atom reader with a great, simple interface and which doesn't need a database.

Check out this Lilina based site for an example.

(via vrypan|net|log)


Larry Lessig on Searching Creative Commons

Yahoo! has just launched a Creative Commons search as beta, and over at the Yahoo Search Blog Larry Lessig has this to say on Searching Creative Commons.

There's also this blurb by Mike Linksvayer on the Creative Commons blog.

Now, this is one of those cases where it would've been great if people added RDF-A
style RDF data instead of "just" a normal link, in which case once you'd done this work once it would be instantly and automatically reusable for any other category of data...

Yes, I'm moaning about the Semantic Web again. Deal.

In the meantime though, it's great to see that Yahoo is giving that kind of recognition to Creative Commons (of course I'm biased since it's my employer...)

Wikinerds: Interview with Hurd developer Marcus Brinkmann

Take a look at the interview - it's long, but cover Marcus' work on Hurd, thoughts on GNU, micro kernels, becoming a programmer and more. Interesting stuff.


Revolting co-routines in C

I came across a link to this post (see in particular the response by Tom Duff) over at Brainwagon about using Duff's device (as if Duff's device isn't revolting enough to start with) to implement Coroutines in C.

Eughh.... Though it is kind of cute... No... Please let me resist the urge to actually USE this...

Update: And while you're at it, take a look at this 'threads' package as well. I haven't looked enough at it yet to tell if the implementation is as revolting as the stuff above.

March 23, 2005

Improve my bookmarks!

It just hit me that it's extremely annoying to manage bookmarks, and I REALLY want people to stick more metadata in their document headers, and for bookmark managers to extract it and annotate the bookmarks and let me use the data to search the stored bookmarks. It's one of those blindingly obvious uses of RDF/RDF-A, and one where Dublin Core entities are already widely used for Search Engine Optimisation purposes so a lot of data is already in there, some of it in a form that fits exactly or almost with RDF-A.

Couple it with other RDF data sources available for webpages, such as RSS feeds and Open Directory, and you could get quite good coverage.

The upside of marking up your static content this way is that it would make it trivial to put together a program to scan your site and generate an RSS file of all recently changed pages as an added value for your regular users.

Usable XMLHttpRequest in practice

XMLHttpRequest in practice is a real gem of an article for anyone who wants to update their web applications with an XMLHttpRequest/AJAX type interface.

(Found at Blue Sky On Mars)

RDF and the Semantic Web ludicrous ideas?

I came across this: RDF and the Semantic Web are ludicrous ideas | Semiologic which is a short and thought provoking alternative view of the Semantic Web. I had to post a response, part of which I quote the most important parts here (go visit semiologic to read the rest):


People won't mark up every little bit they put online, but that isn't needed: People will mark up the bits they care about. I'd rather have info that matters marked up than all kinds of fluff.

Companies selling online will mark up their catalogues because 1-2% extra sales for adding some extra processing of their products database is worth it, and it won't take much of an audience to a new product search engine before they'll be able to reach that.

(...)

Your example is contrived, because there's no point in marking up everything - you mark up whatever will have it's value increased through markup. That means data that it's important for you that people can find and reason about.

(...)

A vast amount of the web HAVE semantic information associated with it in the databases and content management systems they're generated from - but that information is lost when it is output into a form that is only human readable and not easily machine parsable.

Unlock 5-10% of the database content that is tied to the net and we already have the Semantic Web.

Blogcritics: Accelerating the 'Roe Effect'

David Flanagan has written an article on Blogcritics called Blogcritics.org: Accelerating The 'Roe Effect'

While I don't at all agree with his arguments, I found it a very interesting read on the ideas surrounding changes to marriage.

Changing the concept of marriage

My main argument I guess is against muddying the religious concept of marriage with the secular benefits bestowed by governments.

Personally I believe it is discriminatory to allow some people to obtain benefits when living together in a comitted relationship while others don't. I also question the rationale for restricting this in any form to two people. The main issue here is whether the government at all should sanction specific forms of relationships over others in this way, and my answer to that is no.

When it comes to the religious idea of marriage, I couldn't care less. If a church refuses to marry two (or more) people, then that is their business.

If someone chose to rename "civil marriage" to "government approved relationship contracts" or whatever, then fine. What it's about is rights and responsibilities in what is essentially a contractual agreement.

The very idea that marriage is a union between one man and one woman is a religious idea that there is no justification for keeping as secular law

Separation of church and state is a protection for those of us with different world views - whether atheists (like me), muslim, or any other that are not tied to the Judeo-Christian image of marriage.

If anything, disconnecting the two might make more people open entering agreements covering their relationships, contrary to the current situation where marriage is an institution that is gradually becoming less important.

The spread of religion

The idea that Christianity spread by having more children does not benefit Christianity now. On the contrary, muslim families are much more likely to have many children, and tend to be much stricter than Christians with regards to abortion.

Muslims are also much more likely to support polygamy than Christians, given that the idea of up to four wives is well ingrained in many parts.

At the same time I doubt this has much effect long term, otherwise how does one explain the significant decline in fundamentalist Christianity in terms of percentage of the population?

Conservative and fundamentalist Christianity is losing out because people are increasingly more critical to their parents viewpoints - rebellion is an accepted part of youth culture, and access to more varied viewpoints have made people more and more likely to make up their own mind rather than blindly accepting what their parents told them.

We're at a stage today where most "christian" countries face a situation where regular church goers are now in a minority and people largely pick and choose to make their own version of Christianity that is significantly watered down over just a few decades ago (ask people how many believe in hell for instance)

This is a trend that's been ongoing for hundreds of years, and is unlikely to be stopped by some slight changes in the number of children born to various types of parents.

About Haskell and why the quicksort example is bogus

About Haskell is a page about the functional programming language Haskell.

It's well worth a read if you haven't read up on functional programming before.

But whenever I read intro's like these, there is one thing that provoke me: That bloody quicksort example.

Here is quicksort in Haskell:

qsort [] = [] qsort (x:xs) = qsort elts_lt_x ++ [x] ++ qsort elts_greq_x where elts_lt_x = [y | y <- xs, y < x] elts_greq_x = [y | y <- xs, y >= x]

It looks quite straightforward, doesn't it: Quicksort of an empty list gives an empty result, otherwise quicksort of a set where x represent an element and xs represent the remaining elements is equal to the result of applying quicksort to all the elements smaller than x plus x plus the result of quicksort of all elements greater or equal than x.

It's even simpler than the description almost.

Contrast that with the pesky C implementation:

qsort( a, lo, hi ) int a[], hi, lo; { int h, l, p, t;

if (lo < hi) { l = lo; h = hi; p = a[hi];

do { while ((l < h) && (a[l] <= p)) l = l+1; while ((h > l) && (a[h] >= p)) h = h-1; if (l < h) { t = a[l]; a[l] = a[h]; a[h] = t; } } while (l < h);

t = a[l]; a[l] = a[hi]; a[hi] = t;

qsort( a, lo, l-1 ); qsort( a, l+1, hi ); } }

Ohh. Nasty.

What the linked page doesn't tell you is that the reason the C version is nasty (apart from being pre-ANSI C spec judging from the signature) is performance. It's a version that sorts an array in place, whereas the Haskell version have all kinds of potential for massive memory wastage unless the compiler is far smarter than most.

That aside, the C version could trivially be simplified too.

What does the C version actually do? The answer is: Almost exactly the same as the Haskell version.

The Haskell version used what Haskell calls list comprehension to partition the input data into two groups and a pivot (the pivot is the value we split the input data by, and in the Haskell example it's "x").

That's what those nested loops does. In Haskell, partitioning is built into the language. In C it's not, but hoisting the loops out of Quicksort easily creates a partitioning function (in C++ we're even better off: std::partition() would do the job for us) that reduces the core of the C quicksort to just a couple of lines:

Partition the array in place, recursively call quicksort on the two subparts before and after the pivot.

I'll post some comments on how simple you can do it in C and C++ later.

Meanwhile, if you want to see flexible sorting in C++ that's usually faster than quicksort (improving on quicksort by selectively switching to other algorithms), take a look at this article by Andrei Alexandrescu.

For the record, Alexandrescu boils the core of quicksort down to:

template <class Iter> void Sort(Iter b, Iter e) { const pair<Iter, Iter> midRange = Partition(b, e, SelectPivot(b, e)); Sort(b, midRange.first); Sort(midRange.second, e); }

Not far from the Haskell version in complexity, is it?

Whatever happened to...?

Endless fun: WEHT.net: The Online Compendium of 'What Ever Happened To' and 'Where Are They Now?

A great site to go to whenever you wonder who you haven't seen that guy in any movies lately and you're not sure if it's just because he's become a successful theatre actor or if he's become a drug addicted killer... :)

Managing your way out of chaos

Through my career, I've "always" (with the exception of a one year stint in the middle) managed people. However it's been something I fell into more or less by chance, and I didn't have any formal training in management practices when I first did it.

I've recounted some of my learning experiences previously. But one of the most important lessons I've had is in how to rescue a team that's failing due to lack of communication and process. In this article I'll mostly focus on communication.

At one of my previous employers I got in over my head. I was managing the development, and managing the recruiting of the team. However I had no support - which isn't unusual in a startup, but also a real problem since much of the staff can be expected to be inexperienced in some of the roles they grow into.

I mean literally, no HR feedback or support in evaluating perforamce, no support or feedback from my direct manager on anything but technical issues which is the area I didn't need any help in.

Of course, at first I didn't realise I needed help, and now I can do without much of it.

The problems I faced were also a key reason I finally decided to take up my studies again, pursuing an MSc part time alongside my work.

There were three main problems:

  • I had up to 10 direct reports (fluctuating through the period), spread over two locations, and no real support staff (team leaders etc.). This is far too much unless some of your team members are experienced managers
  • Feedback to my team wasn't good enough. This was partly a function of the item above, but partly also a function of lack of experience with some personality types and of some management techniques to make it not matter (more on that below)
  • Processes were non-existent to primitive

Large teams don't work

Generally, large engineering teams - I'm talking about the size of a group directly reporting to the same person - doesn't work. There needs to be a hiearchy in place, even if that hierarchy is fairly informal.

If you're spending enough time on each member of your team, you'll be unlikely to be able to handle more than 5-6 people efficiently, depending on how senior they are. If you're lucky you have people that are strong enough that you can leave them mostly alone, and you can go above this, but you should never count on it.

I did, partly because I didn't know better, partly because I got no guidance from HR (and they were clueless about how engineering teams work), partly because I had great experience with the first few team members to come on: They did what was expected of them and beyond, and they left me enough time to mess around and keep doing some coding.

The problem comes once you get into situations where people have things to talk to you about, or when projects grow large and complex, and you get drawn between technical issues and man management. Man management is "soft stuff". It doesn't have an immediate effect on project deadlines, so it's put aside.

Except it does. It does affect deadlines, and it's even more important in crunch times than when things are easy.

Dealing with a team as large as that with no support wore me out. I nearly quit. I became depressed and stressed out, and started going home early, and became more and more unavailable to my team because I felt I didn't have time: There were always unanswered e-mails, or meetings to go to.

A while later I had a row with HR over my salary, and part of the reason was my lack of communication. I nearly quit over that, in particular because I got extremely provoked by the fact that they had the gall to complain of my lack of communication when a large part of the problem was that I felt I had nobody to talk to sort out the problems. In order words, I blamed my manager and HR.

Improving feedback

While I was right in thinking they were as bad at communicating with me as I was with my staff, I started working hard at improving my part of it as the run in with them made me realise I had to do something. Not for their part, but for my team members part - they certainly did not deserve to be left to face the same burn-out and depression that I had been facing.

I once had to send one of my guys home because he started blowing up in peoples faces after a particularly tough crunch. The warning bells should have been deafening.

Through the reminder of my time at this company I didn't see any evidence of my manager or HR improving their communication with me much. But I did see a marked difference in how I was working, and learned a lot.

Make people give you feedback

No, I don't mean ask for feedback. Some people will give you unsolicited feedback, and those people you generally don't have to worry about - they'll come to you if there are problems. Worry about the people who answer "it's going fine" when you ask them how things are going or how their project is coming along - when something DOES go wrong they'll still say the same thing.

They might not attack you personally behind your back, but they WILL (rightly) complain about how tough things are, and it will

This directly translates into a pattern of periods of relaxed work followed by frenzied crunch periods that just makes things worse

So, make people give you feedback. No torture instruments needed, but the time to sit down one on one with your team members is essential. It doesn't have to be formally, in a meeting room with you taking notes, but it does have to be semi-private. Enough so that someone will feel it's ok to tell you about why they've been grumpy at work, or why their project is going to hell and they need help.

But that's only the beginning. You need to show you're listening. You need to ask questions. You need to probe and show enough of an interest that they get that just fobbing you off with a "everything is fine" won't work. And the most important thing of all: When they do tell you about something that's going wrong, never blame anybody, including the person you're talking to.

Instead ask how they think it could be made better. Ask if things would work better if you assign someone else to help offload them so they can focus on the parts they do best. Be positive: They told you about the problem before it got serious (hopefully), so it can be fixed. That's good, not bad - problems always arise, but they usually only become real risks to the company when they're left unmanaged.

Be available

Never ever tell your staff you don't have time to talk. If you genuinely can't right now, suggest a time when you do have time and put it in your calendar the same day. Often if you have regularly schedules slots for your team members the need to do this rarely arises, as people already knows you're available for them to talk to, and will put off less important issues until you regular meeting.

But that does not mean emergencies can't arise.

Require status updates and read them

A weekly status update at least in writing helps a lot. If someone runs into problems, you should see it on the way their status updates changes. But more importantly, a status update lets you get the basic technical stuff out of the way, and also gives you something to talk to people about - even the people who never talk to you or never talk to you about anything of substance.

Make your team understand you expect these updates. Just asking for them and not following them up doesn't help. Going on and on about them if your team sees no evidence that you ever read them also doesn't help.

Hang around

During my most depressing period at the company I mentioned, I always went home at 5pm. Which was great. For me. I did that in part because I really, really had to get out of there. It was suffocating. I did it in part because I know from past experience that when you're on the verge of burning out you must take time out and try to get your energy back, or you end up working longer and longer days with less and less to show for it - your productivity just nosedives.

What I failed to realise was that it was happening to parts of my team as well. Part of the reasons for the crunches we were facing, apart from bad communication, was that the natural reaction to those crunches was for people to work until late at night.

A significant part of that work could have been done in a fraction of the time if I'd sent these people home, asked them to take a day of, and had them do the same work in a more controlled manner once they were well rested.

I knew that. I've been in that situation myself more than once. But I didn't recognise that in my team. I figured, hey, they want to work late, they can. I did ask them often "not to work too late", but coming from someone leaving them at 5pm when they're facing 5 (or 10!) more hours that's just demoralising.

Even if you can't contribute anything useful, you can provide your support (and cookies and soda, or whatever makes them happy...) And by being there, you can try to stop it from going too far: Even if the crunch is genuinely a problem, and they'll just be working late for a few days, it's counterproductive to have your staff keep working even when you SEE they're not getting anywhere.

Make them go home.

Improving process

Once you've got communications working, you're still faced with chaos. It's just that you finally know just how chaotic things are.

I'm not going to say a great deal about process improvement this time around, as I think I can probably write a long article just on that subject, but I will make some general comments.

First of all, DO NOT try to force a huge process onto an engineering team that has never worked with one before. That goes even if individual members has, or even if the whole team has, but they haven't worked with exactly the process you want. It doesn't work. Been there, done that.

The result can be anything from grumbling "acceptance" coupled with subversion at every step, to outright rebellion.

My process improvement work started with me beginning the process of writing a development manual. Great idea, if it hadn't been so flawed: A development manual should enshrine what you're already doing, not set policy. I am grateful nobody laughed out loud when I provided them with the first draft. Instead it was silently but politely ignored.

Later, what I've observe work is a slow and steady process. Observe how the team works, and put in place one little measure at a time: A weekly status meeting, or even twice a week. Status updates. Change control forms. A simple change control process. Code reviews.

Expect it to take a couple of years to take a team from chaos to structure. But even after the first couple of weeks of actively trying to introduce simple changes you should start seeing improvements.

Successes

After I woke up to the fact that something had to change, I've had multiple opportunities to test out new things and improve my skills, and the result has been great.

One thing I was very happy about was that my first opportunity to improve was with the same team I'd had problems with in the first place - I quickly saw the change in the feedback I got both face to face and via other people, and productivity improved.

But what was the greatest part was when I was tasked with a major new project at a point when the team was split in two. I got a chance to test many ideas with a smaller group. While I still did not use regular one to one meetings, I did introduce more regular calls, and was much more available, and the difference was huge - for the first time in a long time I saw the team truly pulling together, and avoiding nasty surprises, and almost entirely doing away with the problem of regular crunches.

More recently, with my team at Yahoo!, I've had the chance to really make use of what I've learned, and spend a fair amount of my time specifically on strengthening communication and processes. It pays off. When the team works well together, my job is easier, and in the end we all benefit. That's perhaps the key lesson: Becoming a better manager isn't just good for your team, it also makes your day a whole lot more pleasurable.

March 22, 2005

Simple push parsers

I've been toying with a simple table driven push parser class today. Normally I write my parsers as recursive descent either with or without a separate lexer stage.

However I've already disliked pull parsers because it's inflexible - the parser and not you control the amount of IO. As such it easily forces you towards multithreading even when you could've easily multiplexed the application logic.

A push parser by contrast need to work only on the input fed to it. A common way of doing that is in the form of a Nondeterministic finite automaton or a deterministic finite automaton, or similar techniques such as a pushdown automaton, which all can easily be designed to work with single character inputs.

However, I wanted a class that let me easily handwrite parts, so what I ended up with was the following:

A table driven parser with a table per production. For each entry in each table I store a flag to indicate if it's optional, a pointer to another table, and a pointer to an "acceptor object".

The "acceptor" is simply a simple class that provides a method to check whether or not it will accept the current character, and whether or not or not it's reached the end. It allows me to simply customize behaviour, and dramatically cuts down on states by letting me define generic constructs such as "recognise this string".

A simple parser class push states onto a stack until it reaches the first state with no pointer to another production. Once an acceptor is "done", the parser moves to the next entry in the topmost table. Once it reaches the end, it pops the state and skips to the next entry in the new topmost table. It continues until the stack is empty.

This is not to be confused with a pushdown automaton, where the stack is used to store symbols that have been parsed not the history of states.

Actually, this is more or less recursive descent turned outside in - imagine writing a recursive descent parser in a language that supports co-routines: Instead of reading a character, the parser will always yield and won't regain control until a new character is available. Only in this case this is made explicit by returning and retaining an explicitly managed stack

I'm sure this isn't an original technique - it's too simple - but I can't remember if I've seen it describe anywhere. If anyone recognise it from elsewhere, let me know as I'm always interested in finding out if I've missed any obvious optimizations.

Wired: Are socialites still Networking

Joanna Glasner at Wired has written the article Are Socialites Still Networking? that takes a look at whether social networking sites are living up to the hype.

She covers the split into two sub-groups: meeting friends and professional networking.

Personally I think the former is mostly useless, but find the latter of some use via systems such as Linked In that offer limited access (on a recommendation basis) to people that may have significant value for you to get hold of, but which you might have no idea of how to reach.

March 21, 2005

The failure of abstinence in sexual politics

According to a report by Yale and Columbia University researcheers many who pledge abstinence are at risk for STDs.

Among virgins, boys who have pledged abstinence were four times more likely to have had anal sex, according to the study. Overall, pledgers were six times more likely to have oral sex than teens who have remained abstinent but not as part of a pledge.

I'm not the slightest bit surprised by this - they're comparing people of which a large part are likely to have pledged abstinence because of expectations from relatives and social groups around them.

What is most worrying though is this bit:

The pledging group was also less likely to use condoms during their first sexual experience or get tested for STDs, the researchers found.

Again, this is not really surprising, as if you've been pressured into pledging abstinence you're not very likely to be in an environment where you have learned about the risks of STD's and pregnancy or where you are willing to take the risk of your parents and other relatives and "friends" finding out about your sexual escapades.

Abstinence didn't work previously in history, and there's no reason to believe it will work now - unless you force girls to commit to checks to verify their hymen is intact or similar barbaric practices.

It is quite damning how few "succeed", though:

Last year, the same research team found that 88 percent of teens who pledge abstinence end up having sex before marriage, compared with 99 percent of teens who do not make a pledge.

Nice work. So instead of having as many teens having sex, but being well informed about STD's and way of protecting themselves, you instead have a slightly lower number having riskier sex and going into their first sexual experiences largely clueless about the risks and consequences.

The hypocrisy of religious groups that are highly likely to be "pro-life" abortion opponents supporting "sexual education" programs that make unwanted teen pregnancies more likely is what I find most absurd.

Visual programming

PlutoSpin- GIPSpin (Graphical Interface Programming) is the latest in a long range of attempts at visual programming systems (not to be confused with visual IDE's for text based languages).

I'm not convinced of the versatility of the approach they've taken, but I always like taking a look at new attempts in this field.

Why does all diagramming tools SUCK?

During work, and as part of my part time studies, I frequently have the need to create diagrams. Most often things like UML diagrams, flowcharts and ER diagrams.

So far I've yet to find a single usable - be it closed source or open source diagramming package.

The genral problem seems to be one of two things: Either a package is completely unstructured, and you can do whatever you want, OR a package constrains you to rules, 90% of which will be fine but 10% of which inevitably clash with your particular taste.

Visio is the worst I've come across of the latter. If you choose one of the "formalised" types of diagram in Visio, it turns into a nasty, arrogant know-it-all that places a lot of constraints on the way you draw diagrams OR forces you to forfeit all support.

XFig for instance is on the complete opposite: Generally you're left to your own devices, but it means complex diagrams are extremely tedious to draw.

What I want is to be able to draw "anything". That is, I want to be able to break as many constraints as possible. On the other hand, I want it to be easy to stay within the constraints, and I don't mind a (non-obtrusive) reminder of what I'm doing that doesn't fit the current ruleset. If it's easy to ammend the current ruleset, then even better.

One app I really likes in many respects is Ideogramic UML. It's not open source, but a limited version is available for free as in beer. It's main attraction for me was the gestures (which worked surprisingly well and saved me from that other monstrosity: the overgrown palette) and it's basic support for freehand drawing to annotate the diagram. My main criticism is that you're still forced into a too restrictive model, and the freehand drawing (though a great idea) is too limited beause it's essentially, well, freehand, with no drawing tools available.

Surely this can't be that hard? Checking that you conform to a model without yelling and screaming if you don't - just display a warning in a status bar or change the color of the offending element (and let me turn it off) instead of refusing to accept it.

Allow me to use arbitrary geometric shapes.

Provice a library of elements that have a certain look and certain places to put text and attach connectors, without enforcing specific semantics.

I want a smart diagram editor, not a modelling tool enforcing a specific form of modelling. I want nice looking diagram, not source code generation...

So why can't I find anything that's usable?

Semantic web as future reality

This entry at Fred on Something neatly summarises my painful experiences while reading the W3 specs and assorted tutorial this weekend:

The thing is that RDF is not intended to be easily understood by humans like simple XML documents. RDF is intended to be understood by machines.

However, I still think the lack of accessibility of the W3 specs is a big problem. The XML spec is reasonably accessible. Even the XML Schema spec is. I can sit down with them, read them, and start writing a parser. Granted, it wouldn't be a very good parser if I didn't know more than I'd learned from a single reading of the specs, but I'd be able to.

It's less important that the formats are inaccessible if the specs are easily accessible so we get good tools to deal with them.

Nobody cares that Postscript is painfully obtuse to read in a text editor, and that doing so won't really tell you much about the document it describes, because we have good tools to manipulate postscript files and few of us need to interpret the files directly.

However the RDF and OWL specs are painfully dense, and painfully fluffy and full of mathematical terms that for me and most software engineers I know reads as mostly nonsense.

This massively complicates the issue of getting good tools to work with it, and at this early stage even makes it hard to get people to understand the potentials of the technology.

I'm sure these specs represent great work, but it could have been so much better if more effort had been put into 1) examples and 2) presenting the normative semantics by specifying the intended effects in terms of observable effects on the RDF graph, or conceptual addition of RDF triples (even if the implementation wouldn't necessarily have to store these triples).

The triples aren't hard to understand. The RDF graph isn't hard to understand. The bloody description ohe OWL semantics IS.

I wish the W3 would take some cues from ECMA, and do what ECMA did for ECMA 262 (the ECMAScript / Javascript specification), where the document specifies the semantics of the language by presenting expected results in terms of code rather than abstract mathematical terms.

Personally I have this intense hate for these kinds of specs as they're hardly ever needed.

I have no problems understanding how to implement a backpropagation neural network, for instance. However that is thanks to plain English or pseudo code descriptions of the algorithms involved. If somebody tried showing me a mathematical representation of it I'd glaze over instantly.

I've yet to see a single example of something presented in this kind of notation that isn't possible to do just as well in natural language, and that will be significantly more accessible to a significantly larger audience.

If you want to win the Nobel Prize in maths then accessibility to the general public isn't needed as long as other leading scientists understand you. If you try to write specifications with the goal of transforming the web, which became successful largely exactly because it was accessible and anybody could easily understand how to make use of the technology, it is.

N3 as a logic language

After yesterday's entry Understanding the Semantic Web: N3 to the rescue I went on to spend some time actually starting to write a N3 parser. The language is straightforward enough, though the BNF grammar was a bit awkward.

That might be because it was actually generated from an N3 description of N3 itself. I ended up reading the N3 description instead of the BNF, and got most of the way to having a working language checker (as in, it parses most of the language but throws away the result) in a couple of hours, and plan to fill it in to create a proper parser later this week.

N3 seems promising to me both as a way of exploring RDF and OWL and as a data format in it's own right.

I'll want to implement a basic RDF storage model as well, but that seems quite straightforward (I'm looking at a testing ground, not production quality code)

I was looking at Redland yesterday too, and while I'm sure it's a fine system, it just seems far too complex for my taste.

N3 really drove home the idea that what RDF-S and OWL and the rest is really about is simple logic programming based around Horn clauses. It's a very constrained, and simple model, which is good because apart from some toying with Prolog when I was a kid I haven't spent much time on it - this is a great opportunity to read up now that I have real world applications for it.

Open Source as research

Martin Fowler has a short interesting article on how Open Source contributes to R&D

March 20, 2005

Understanding the Semantic Web: N3 to the rescue

I've spent most of today reading up on RDF, OWL and other painful stuff. Things were going really slowly (what f******d decided using set theory to describe the RDF semantics was a good idea, when it could have been so "simple" if they'd instead just explained things in terms of what triples could be inferred) until I came across N3.

I briefly mentioned Metalog earlier, and that was a great start - allowing me to play around with "human readable" assertions. But N3 is a step closer to the "real thing", and in fact Ntriples, a reduced form of N3 can be generated by Metalog.

N3 is part of a Semantic Web Application Platform (or Playground) set up to facilitate practical demonstrations of semantic web technology. So far it's succeeded for me - it's told me far more about the Semantic Web than any of the specifications.

The N3 grammar seems clumsy and badly documented, but there is a great tutorial covering N3 and how to apply it to the Semantic Web.

If you feel brave, you might also want to take a look at Euler - a Java app to verify conclusions by inferring proofs for them. The Java code for Euler is some of the nastiest stuff I've seen (1700 lines in one class and pages upon pages in a single function) but it seems like something worth investigating further once I've digested more of the N3 stuff.

Google sued by French Press

Agence France Presse is suing Google for syndicating their news stories on Google News.
I found out about this via this post at ThreadWatch. See also this CNet News.com article.

This lawsuit cuts to the heart of fair use and the copyright control of databases. The comments on Threadwatch appear divided - particularly with regards to whether opt-out (via robots.txt) is sufficient.

However for most search engine features, opt-out is clearly sufficient: Fair use protects your right to quote from and refer to a work by title and other distinguishing features, or to describe facts about a work or from a work in general. The typical search engine listing is well within the limit, and if the search engines wanted to, they could likely safely ignore robots.txt with no legal consequences.

The features that are more interesting are Google's cache, which is an outright copy, and their aggregation of data that a company such as Agence France Presse may try to assert a database copyright for, since they provide a syndication service.

This is more likely to involve database copyrights.

In the US, a database can only be protected if it shows originality in it's selection, coordination and arrangement. This means that a purely automated aggregation of all news items provided by partners for instance likely would not receive any protection.

However even then, extracting facts from a database is legal, short of duplicating the structure of the whole database.

Under EU law, databases have sui generis protection, that is, you can't reuse data from a database while it is protected (protection lasts for 15 years), even if the data is pure facts. If you want to use these facts you must compile them yourself from other sources. If Google have used AFP's aggregated data to compile the news, this may well be illegal under French law - however it seems weird for AFP to sue in California, as the US Supreme Court have explicitly rejected the idea of such protection under current US copyright law.

The paradox is that Google (and anyone else) clearly have a legal right to publish the location, descriptions and titles of these news articles under fair use, but that there is a chance that aggregating them may violate AFP's copyrights depending on how it's done, and what jurisdiction we're talking about.

(Btw. I am not a lawyer - don't rely on me to decide whether or not it's safe to aggregate data...)

Groklaw: Report from UK PTO software patent workshop

Groklaw has this eyewitness report from the UK Patent Office's Technical Contribution Workshops related to the EU software patents directive.

Don't expect much to come out of this - the UK government is firmly pro-software patents, but it's interesting anyway.

Ah... I get it now: It's Prolog all over again

Metalog - the semantic web query/logical system seems to be exactly what I've been looking for in terms of allowing simple exploration of Semantic Web technologies without all the lofty promises clouding things up.

Take a look at the quick guide and if you've ever read an introductory article on Prolog you'll be right at home... (In fact Metalog uses Prolog for it reasoning support)

It's helpful in that it allows you to translate the Metalog input into RDF triples and RDF/XML format, while playing with it in a pseudo natural language, so it drives home the mappings much more effectively than most tutorials I've seen.

I wish Metalog had OWL support too, but it's a start, I guess.

Ontology Development 101: A Guide to Creating Your First Ontology

Stanford has a great tutorial online as part of their Protégé project (an open source ontology editor in Java w/an OWL plugin). You can find the tutorial here

I found the link to Protégé at Chaz Blog - seems he's running into some of the same problems I do with how to apply this stuff in practice.

What on earth are Ontologies and taxonomies?

Search Science has a great little summary here

March 19, 2005

The complexities of OWL

I've spent today writing on an essay on the Semantic Web, and reading up more particularly on OWL.

What hits me is the complexity. The OWL Guide was a big help, but I still find it difficult to see how to apply it to the real world. I mean, I can see the potential - the idea of being able to effectively convey semantics, even in the the face of data using different ontologies, and the promise of being able to query about properties that are not explicitly written out in the data through machine reasoning.

But I've yet to find a tutorial or introduction that explicitly address more directly useful scenarios instead of the "let's build a complex ontology" scenario.

That, and the lack of a wide choice of tools to reason about OWL ontologies means that we're likely still years away from seeing the real promise of the Semantic Web realised.

In the meantime there is still lots we can do to approach the Semantic Web gradually. One of the main things is to embrace RDF directly or through RDF-A or GRDDL. Without OWL we're stuck doing things like inferring mappings between various ontologies by ourselves, but the more widespread RDF datasources become, the more incentive are we creating to invest in creating tools that can solve the interoperability issue (whether by making OWL usable, or by finding something else).

I'm curious about to what extent the complexity of OWL is needed, or whether it is complex because the problem is still not sufficiently well understood and a simpler solution may come along.

Writing 4000 words on the Semantic Web

... turns out to be very easy. My biggest problem with my essay assignment is going to be to cut it down enough to fit within the word limit.

I should have done less reading before I started :)

I won't post the full essay, but I'll make notes on any interesting ideas I get while writing it, and write a few entries on it after I'm done.

Wired summary of EU software patents

Wendy M. Grossmann has written a good summary of the current status of the EU software patents mess here

Defender of the Linux faith

News.com has an interesting article titled Defender of the Linux faith | CNET News.com about Harald Welte and his work on http://gpl-violations-org">gpl-violations.org

Since setting up the project, Welte has made 25 agreements with companies that were violating the GPL, as well as setting up two preliminary injunctions and one court order. Each of these companies used GPL code without making the altered source code available--a requirement of the licence.

David Weinberger on taxonomy and tags

David Weinberger has posted som short noted from his birds of a feather on taxonomy and tags from etech that's well worth a read.

They sum up quite simply the differences between "real world" taxonomies, structured around the concept of straight subdivision of concepts, and the emerging categorisation that occurs with tagging, where everything is "one big pile" of concepts with added semantic markup that leaves the categorisation to users (whether human or software).

UK school micro managing pupils lives for no good reason

According to the BBC a UK school bans a girl for getting hair braids. What is it with UK schools and their fascist obsession with controlling students and the way they dress?

After near 5 years in this country I still don't get why this country has such an obsession with turning school children into sheep with no ability of independent expression.

March 18, 2005

eWeek: Linux & Open Source Header

SCO Has Its Day in Nasdaq Court

eWeek has an article titled SCO Has Its Day in Nasdaq Court that summarises the issues surrounding SCO's missing 10-K and their meeting with the Nasdaq Listing Qualifications Panel (No decision on their delisting yet)

Dan Gillmor has some scathing comments on Bush and taxpayer-funded propaganda.

It's scary reading. I've never had any illusions about the objectivity of the press, but I have generally assumed a certain minimum level of honesty, even if blatantly biased according to an editorial agenda.

This kind of propaganda is more insidious. A right wing journalist that writes about news from a right wing viewpoint because they believe in it I have no problems with. It's relatively obvious most of the time. But news items produced like this, with no proper attribution that allow you to consider the source with a critical mind, is nasty.

But then I'm not really that surprised, given the, from an European viewpoint, extremely right wing slant of US media (which makes it hilarious whenever US right wing people bring out the "liberal media" complaint).

European mainstream news span a quite wider spectrum, all the way from communist newspapers in many countries including L'Humanite (in English), to right wing, extremely conservative papers such as Daily Telegraph, and of course we do have the local variation of Fox.

Regardless what you think of them, this wide span gives you an important corrective, as it makes it much harder for anybody to get away with anything - if a left wing government attempts to feed it's voters propaganda, there's plenty of right wing newspapers that have every interest in tearing it to shreds, and vice versa.

In a "media monoculture" like the US media looks like to me, getting away with distributing propaganda is much easier - nobody really have much to gain by rocking the boat.

A photo a day...

There is this crazy guy who have done something I've thought about every now and again, but always been too lazy to do anything about.

Since October 1st 1998, except for a few months break in '99 and early 2000, he has taken a pic of himself every single day.

Would be interesting to see an animation of all those images...

CGI spec makes me start feeling old...

After yesterday's entry on old tech, I suddenly for no reason today searched for "cgi specification" to see if the original page was still there. It is: The Common Gateway Interface

It's been unchanged for more than 9 years now, according to the Last-Modified header, and I remember seeing it first time back in '94 or '95...

At a time when the average lifespan of webpages is on the order of days, it's fun to see that some vestiges of the early days of the net are still there...

Of course, we have the Wayback machine for a lot of stuff, but it's not quite the same :)

I won't do your donkey work for you Ebay

Yesterday I got a phishing scam in my mailbox, purporting to be from Ebay, and requesting me to confirm my cardnumber. Apart from the obvious scam alert that was immediately ringing in my ears on reading "confirm" and "card number", I've never even used Ebay (yes, I'm one of those people).

Nice as I try to be, though, I generally make a quick attempt at forwarding the mails to whichever company they pretend to be, to help them get it stopped. So in Ebay's case I tried forwarding it to abuse@ebay.com, assuming that was a reasonable choice. Not being directly affected to this, I wasn't

Apparently so must other people, because they have an autoresponder there. However I was less than pleased with the response:

REPORTING SPOOF

If you received this message after attempting to report an email that
appears to have come from eBay but actually directs you to another
site,
you must forward the message to us again by using the forward function
of your email program. Make certain that spoof@ebay.com is in the "to"
field. Do not alter the subject line, add text to your message or
forward the email as an attachment.

Now, I can understand that they have a preferred address to receive this to, but they are already receiving mail to abuse@ebay.com, and I'd think phishing scams would be one of the best candidates for an address like that.

They're also telling me not to alter the subject line, add text to the message or forward it as an attachment, all of which will require me to change settings on my mail program to forward this to them, this before a human has looked at the damn message I sent

If I were to spend time doing this for all the junk I receive, I'd never be doing anything else. I'm happy to forward a message to them. I'm not happy to do it twice and jump through hoops to warn them about something that doesn't affect me in the least.

Looking at Ebay's homepage, there is a link to their "security center" as the only obvious alternative. However, going through that link brings you to multiple choice hell, after which you're kindly prompted to log in or register. I'm not going through their security links because I want to be an Ebay user - I'm going through it to report a fraud problem.

So from now on, whenever I receive a mail claiming to be from Ebay, I'll click the "spam" button in my mail client and hopefully it will soon be trained to give Ebay the same automated treatment they gave me.

Groklaw: Slip, Sliding Away on Software Patents

This article over at Groklaw covers how EPO (The European Patent Office) went about to justify granting software patents even though current EU patent law doesn't allow it.

It follows up with some quotes on the future of patents from the USPTO.

March 17, 2005

Pervasive version control

Martin Fowler, consultant and author of books such as UML Distilled has an interesting short article on Subversion:

As someone who uses version control all the time, I think it's something that can grow into more areas of computer use. Other than software developers, few computer users use version control. Yet as software developers know, version control is a great mechanism for collaborative work, allowing multiple people to work together on a single software system. What would be the benefits of version control being more widely used?

Yahoo pledges full Firefox compatibility

It's strange when you work at a company and find out cool new stuff like this from a news site first, but that's the way it gets in a large company... Anyway, I'm was thrilled to read this ZDNet article.

Most of the Yahoo! services I use work fine with Firefox anyway, but I know a lot of people that'll really be hoping for things like Launch to be supported.

Code v2 - Transformation

What's most interesting about Lawrence Lessig's latest experiment - the release of Code v2.0 on a Wiki is not that he released it for free. By now that is no longer a novelty

Neither is it particularly novel that he releases it on a Wiki. Lots of large Wiki projects are flourishing, and it doesn't seem particularly strange to put a free book online like that.

What is novel is the way it has been done. Code v2.0 haven't just been dumped on a Wiki and left there to bitrot as happens to many works. It's been put out there with an explicit agenda, and a significant set of comments to guide a transformation of the book with a definite purpose.

I have read some of Lessig's articles, but have never read Code. Time to take a look, and who knows maybe I'll decide to try providing some input. Just because it's there. I hope this experiment succeeds, as if it does I'm sure more will follow.

My 100th post

I don't know whether I should be scared or upbeat about having posted 100 entries in less than two weeks... But's it's a fun experiment, and I've learned more about subjects such as the semantic web, rdf, Atom and a variety of other subjects in the last two weeks than in the previous couple of years...

Maybe my next project should be to get a better stylesheet in place, though...

Semantic Web Round Up

I've written a few entries about the Semantic Web already, but since my deadline is nearing on my essay for the MSc. course I'm doing, I've started rounding up a few links that I think is worthwhile sharing as well.

That's a debate that seems to be getting more and more heated.
On the more artistic side Dan Cooney has an interesting interpretation (via Edward Vielmetti).

The Semantic Web is here is a great introductory presentation by Eric Miller (via hannes.kaywa.com)

At heart of this debate is the discussion on whether folksonomies or ontologies provides the most value in a distributed, uncontrollable media like the internet.

This also tends to translate into a debate on whether to use micro formats or RDF as the carrier of semantic information.

Some thoughts on RDF vs micro formats

A quick and dirty RDF tutorial (via Ebiquity blog at UMBC - Thanks!) is a good way to start if RDF and the semantic web is completely new to you.

Regardless of past experience with the semantic web I would also recommend Tantek's presentation The Elements of Meaningful XHTML which provides a great overview of Micro formats and how to convey as much semantic information as possible through the use of XHTML alone.

However, while I see micro formats and overloading XHTML as useful to some extent, it misses a lot of the potential of the Semantic Web by not making the semantics of the markup easily discoverable, for instance through RDF-S or OWL.

RDF without the nasty syntax

One way of getting that benefit while at the same time achieving much of the simplicity and bottom-up approach to the semantic web is RDF-A.

Instead of specifying individual micro formats, RDF-A is an attempt to provide a trivially simple way of attaching attributes and elements to an XHTML/XML document from which it is easy to derive RDF triples.

The benefit is that you can do all the fancy stuff that people are working towards doing with RDF, while at the same time getting most of the simplicity that Tantek and the guys at Technorati is going for with the micro formats.

You can have your XHTML and still get RDF too

Another approach to solving this problem is GRDDL:

A mechanism for using transformations (in XSLT in particular) to express the relationship between XHTML dialects and RDF in order to expose the data in these dialects to the Semantic Web. The mechanism extends straightforwardly to XML formats in general.

GRDDL would allow micro formats to live alongside RDF eating agents by letting the XHTML specify a transformation, for instance an XSL document that specifies how to transform the XHTML into RDF.

GRDDL could be used for RDF-A as well, obviating the need for an RDF processor to specifically know RDF-A or future alternative syntaxes as long as it knows how to apply the transformations.

See GRDDL as the glue that makes it possible for you to more or less ignore the "war" between RDF and micro formats as markup if all you want to do is write apps that consume RDF.

In the end I prefer RDF-A over Microformats because they seem to give more potential for reuse.

Mike Linksvayer knows this stuff better than I do, and has this to say (this was where I found cool stuff like RDF-A)

Who'll tag all this stuff?

In this article Russel Glass raises exactly that question and goes on to say:

Just as the Web, however, allowed an upstart like Amazon to compete with Barnes & Noble, the Semantic Web has the potential to level the playing field again for a whole new generation of startups. With the Semantic Web in place, any vendor will have the ability to tag their product information and make it as easily accessible as Amazon.

I agree with this. One of the key values of the semantic web is that it breaks down virtual monopolies. Today, it takes a tremendous effort to gather together product information to set up a product search, for instance, because the information is harder to find than need be. Contrast it to how easy it is to find news items via RSS. Now imagine that a retailer can achieve only a one percent sales increase thanks to aggregators and new search services if they tag their data. They'd jump on it instantly.

Will the Semantic Web become a success?

Notes from The Semantic Web: Promising Future or Utter Failure, a panel discussion with Linksvayer, Galbraith, Marlow, Haughey, and Champeon is a good read for a quick introduction to the various viewpoints.

philwilson.org: Finding related items in your RSS datastore points out another issue: Once you have all this information tagged, as we do with blog entries via RSS for instance, how the f**k do we actually find what we want in between all the cruft that's bound to show up (take a look at the list of blogs at blo.gs for instance, and you'll see a worrying number of pure spam blogs), as well as all the stuff posted with a good intention that simply isn't your cup of tea.

Last time I used a typewriter (and other old tech)

I was just passing by the stationary cupboard, and purely by chance I noticed a bunch of bottles of tipex, and thought to myself "wow, are anybody actually still using that?"

I came to the conclusion that the last time I had used it must have been the last time I used a typewriter.

When could that be? I quickly assumed at least ten years ago, but it must be longer. My mom and dad both worked with PC's from '85/'86 or around there, but both had typewriters in their offices as well, as printing letters was still not generally done where they worked - the printers didn't give good enough results to be considered acceptable for external letters.

I think by the last time I had a summer job at my dads office, they'd stopped using typewriters at all, and that must have been no later than '90, I think.

In '87 I got a "state of the art" typewriter with a tiny (40 characters I think) LCD display and memory for corrections (it used a dot-matrix print head and wouldn't actually print until the buffer was full). I don't think I used that past '90 or '91 either.

So I've been entirely typewriter free for at least 14 years...

It made me feel rather old, and I'm just turning 30 this year (April 21st, by the way, in case anyone feel like buying me some expensive gadgets :-) ).

What other outdated technology did I use while growing up? Cassette recorders were a large part of it, and so was vinyl (which to me is obsolete, though I realise there is still a specialist market). Video tapes are now almost dead, with the first video stores already removing their last tapes to make room for DVD's.

Film cameras will still hang around, I guess, and while I think you can still buy Polaroid cameras, I haven't seen one in person in at least a decade - who needs it with digital?

Of course I've been through several types of obsolete home computers, but they've been part of evolution rather than a move to something entirely new.

But I do fully expect to leave my last CRT screens behind this year or next... And I won't miss them. Back when I was a kid (boy do I feel old writing that), we were all sort of assuming everyone would have 80" flat screens built into their walls by now...

Does anybody remember the analog landline phones? Ok, so I'm being a little bit sarcastic, but I'm all wireless, all digital there finally. I don't really know why I still have a landline at all really, as I almost exclusively use my cell phone (before I moved to the UK in 2000 I had an ISDN landline, and never connected a handset, it was for data only - people were confused about why they'd find me in the phone book but never get an answer...)

Modems. I hate modems with a vengeance, probably because I once ran an ISP and had to deal with US Robotics Sportster modems hanging on a regular basis. Haven't used a regular phoneline modem since '98. Haven't even seen one in years. And I threw my last ISDN card out in 2000.

For many years I thought that things were moving slowly, and I'd never see the kind of changes my parents saw from when they were kids, but looking back it's clear I was wrong.

Heck, I've used the internet for more than 10 years now, and BBS's from a couple of years prior to that, and the transformations are already massive, though in some ways harder to spot because they aren't tangible.

But the disappearance (more or less, for me anyway) of typewriters is still strange to me. They were so integral to my childhood, as I was constantly creating some newsletter or magazine or something (which I frequently sold, based on the concept that as a 6-7 year old child people will buy your product regardless of quality just because they think it is cute...)

Collective Type: Cooperative font design

Devon Strawn: Collective Type, Wikis, and Eigenmedia pointed me to Collective Type, a project to explore what happens if you let random people affect a font design. Each character in the final font will be composed of a combination of 255 images provided by visitors to the site.

While not suitable for use as a font for normal use, the font is possible to read, and it's an interesting experiment exploring ways to harness collaborative design.

Say it with software art...

... at runme.org, a fascinating collection of links to things like Perl poetry, and evil abuses of C (aka Duff's device).

Are religious beliefs inherited?

A study by U Minn researchers claim that at least your predisposition to being religious is at least in part genetically determined.

Randall Parker over at Future Pundit comments on the study, saying amongst other:

"If any of the Minnesota researchers see this then what would be extremely interesting would be to collect fertility data on these men. Are the more religious men reproducing at higher rates than the less religious? Are genes for religiosity being selected for? I'm guessing the answer is Yes!"

I wouldn't be surprised if he's right. Religious beliefs or the lack thereof have "always" been a major factor affecting social inclusion or exclusion, and as an atheist I have no problems seeing that I would be at a disadvantage socially in many situations even today. Go back a few hundred years and I might have ended up being executed for my beliefs.

It would explain why holding a rational discussion about the scientific testability of religion is so hard, too...

But for now I'll take this study with a pinch of salt, as I haven't read more than the press release.

March 16, 2005

The lowercase semantic web

Tantek çelik and Kevin marks have a presentation titled Real world semantics covering micro formats and other building blocks of a "lo-tech" version of the semantic web focusing on an evolutionary, developer led approach rather than the committee approach.

Make your customers work for you

Brian over at FutureWire has written a short article called Smart Companies Putting Their Customers to Work based on this article over at the Economist.

Harnessing advanced customers is something I really believe in. It's also one of the best foundations for startups: You should create a company for which you would be the ideal customer.

In the past, I've been in situations where my suggestions were brushed aside because I was not considered a "typical user", and hence was considered useless for market research. But I always firmly believed that to be the wrong approach, as in any field "typical users" never stretch the envelope. They know of a limited subset of features.

Limited subsets does not create happy user.

As previously discussed all of those seemingly useless (to the "typical user") features are highly likely to be used by somebody, and almost every unusual feature is likely to be the one feature which makes a couple of your users pick you over the competition.

Advanced users are valuable not only because they are likely to be intelligent and innovative, but because they are users that are likely to exploit otherwise unused funcitonality and stretch an application to it's limits.

Users like that will inherently in many ways represent a wide subset of an applications user base in that they will know features that are in use by many small, non-overlapping segments of your userbase, and can compare features against each other and can see why these groups of users like or don't like particular features.

Building a startup around what you want for yourself instead of what you think the market may want is the way to go, because the market doesn't have intelligence - the market is just the amalgamation of users that all know about a feature or two they want that most other people don't want.

If you listen to "the market" you will end up with something that meets most needs of most people, but make hardly anyone fully satisfied.

Meanwhile your competition will be busy listening to their advanced users and adding all those seemingly useless bells and whistles.

Danny Ayers: XML and/or RDF

Danny Ayers have another great article on RDF, using Amazon's OpenSearch Description Document. He briefly ties it in to FOAF and DOAP (a RDF schema describing a vocabulary for describing an open source project).

Giving balanced feedback - How not to be a jerk

Our the years I've been involved in the design and planning phases of a long range of projects, and I've always been very outspoken because I tend to have a lot of opinions about both strategic issues and technical issues surrounding the projects I get involved with.

However, it took a random conversation to realise my biggest mistake in the feedback I gave...

At one of my previous companies I worked together with a very talented guy on the legal side (and while I'm exceedingly bad at forming lasting personal relationships with people I work with, I can honestly say he was one of the few people that went through management at that company I'd still trust in a business situation), and in conjunction with a relative longwinded contract review process we were both involved in, we were at some point sitting in the meeting room with a few other people.

Suddenly he told me:

- When I started I thought you were a real jerk.

Initially I was just dumbfounded, because I didn't thought I'd ever given anyone there a reason to think that (well, once, but that was quickly sorted out), though I might not make great efforts at forming relationships or making friends at work. I thought it was insensitive and rude, and started thinking about some really sarcastic reply.

But he quickly followed up that comment with an explanation. The reason he thought I was a jerk was that he thought that almost every time he suggested something, I came up with a long list of problems with his idea.

But then he realised I came up with a list of problems because there were real problems with the proposals, and I didn't do it just to him.

To me, I had just been mindful of my job. I'd analysed the problem and found a list of issues that needed solutions. And I expected people to get that, and work with me on finding solutions that worked, or drop the idea flat if it couldn't work.

Lesson one - Present possible solutions, not problems

What I should have done, and which I generally do now as a result of thinking through that conversation and how I came across (Thanks!), is not to ignore the problems, but to present solutions.

They may not be workable, fully fleshed out solutions - sometimes they'll just be rough ideas of areas to explore -, but they're a starting point.

It's important to see the difference between glossing over a problem, and saying "we might have to do X and Y to make that proposal work, because of Z". Instead of shooting down the proposal you'd present ideas that make the original idea work, even if it totally changes the character of that idea and the end product is something different than originally envisaged.

The net result is the same for the business, but with one big difference when it comes to the work environment: You're not making someone else potentially look bad for having proposed something "which couldn't possibly work", and you are not looking bad for being the guy that's always bringing up problems.

Lesson two - Compliment on ideas even when they have problems

I'm not a very outwardly emotional person. I won't run up to someone and shake their hands, or loudly and excitedly tell everyone how happy I am about something. Which is fine. However I also don't easily give simple verbal compliments.

This isn't necessarily because I'm not happy about what someone has done, but because I'm generally a very positive person and expect to be happy about what someone has done.

However think for a second about what it looks like to other people: You never tell them they've done a good job. You never praise them for an idea. You never talk them up around other people. All because you're really, really hapy about the work they do, and that is how you'd expect it to be... But they won't know, and neither will their colleagues.

It sounds obvious, but it isn't. I didn't realise for years, because to me that kind of feedback have never been important - I have a large enough (some might say inflated ;) ) ego to believe in myself significantly more than I believe in praise from someone else, so I've never seen it as something that was important to give other people. Though given my managers over the years I'd say most of them probably fit in the same category.

Feedback is vital. Not only to your direct reports, but to anybody you encounter in your daily work, such as the guy I mentioned that I was often in meetings with. He had lots of great ideas, otherwise I wouldn't have bothered commenting on them. If his ideas weren't worthwhile exploring, there would have been no point for me in spending time finding problems with them, because they'd die a quick death and not become and issue.

So because I expected good ideas from him, I'd sit in meeting after meeting and tear his ideas to shreds in an effort to make sure we resolved any problems so they'd work.

Picture being on the other side of that, and knowing that when you open your mouth next, some tech guy that's generally otherwise keeping his mouth shut and that you don't really know that well on a personal level will promptly decide to produce a list of 15 bulleted items with why your idea will fail.

As noted above, my first error was to produce a list of 15 bulleted items with why the idea will fail instead of a list of 15 proposals for how to fix the idea so it works.

But that still seems negative and still comes across as criticism when it really isn't, and so my second major mistake was never saying "that's a great idea!" before producing my list, and not pointing out all the things that were good and should be kept.

Praise an idea when it is offered and before you suggest changes, praise the person one to one, both with the person in question and to his/her manager. Don't go to the rest of your team and tell about how this particular person is great, it'll easily cause grumbling, but do praise a specific item of work.

Trust me, no matter how obvious it is to you that you respect someone, unless you say so face to face to the person in question or to people who will pass the feedback on that person is likely not to realise.

Again, this is something you'd think was obvious, but it's an area were I kept failing, and where I know from personal experience I'm by far the only one.

(Nothing reveals things like this better than exit interviews, when you sometimes see just how big the gap between what the exiting employee and their manager's perception of the value placed on the employee is - I'd like to know how much money companies lose in recruiting costs just over this alone)

Lesson three - When you've been a jerk, apologise

It's inevitable to come across to harsh every now and again. Some ideas NEED to be torn to shreds so they can be buried. But even bad ideas are sometimes worth praising because someone took the time to try.

When an idea is genuinely bad, explain why, suggest other avenues to consider, and point out that you'd like to discuss other ideas. Point to parts that seems promising, and try to figure out why the person proposing an idea thought it was good and would work, and praise the good parts about their thinking. This is constructive feedback that both encourages openness, but also help the proposer understand how to do a better job next time.

Just try not to lecture.

But if you've still come across as a real bastard afterwards, apologise. Pull the person aside afterwards, or even apologise in the meeting, and make it a point to explain that it's all about the issue, and not the person and point the part of their work you really respect, or related ideas they've had that might be more worthwhile.

Honeynet project estimates 1 million machines in use as botnets

Netcraft has this report Netcraft: Honeynet: At Least 1 Million Machines in use as Botnets.

Know your Enemy: Tracking Botnets paints a scary picture. With one botnet reaching more than 50,000 compromised machines, the potential damage both to the owners of these machines and to the internet at large if these numbers are right could be huge - defending against things like distributed denial of service attacks is hard enough for well established internet companies - smaller players could be crushed easily.

The size of these botnets also open up the possibility at massively distributed harvesting of personal details such as credit card numbers.


AOL clarifies IM privacy guarantee

According to an article by Declan McCullagh at News.com, AOL plans to revise its user agreement after the uproar over their excessively restrictive terms. I mentioned the AIM Terms of Service here.

Lessig takes stand against restrictive publication agreements

On his blog Lawrence Lessig notes that he's commited himself to open access for future works to be published in academic journals.

Quote: "I will not agree to publish in any academic journal that does not permit me the freedoms of at least a Creative Commons Attribution-Noncommercial license."

This is worth an applause. I hope more will follow. On the computer science side we already have a reasonable level of open access, and services like CiteSeer provide an invaluable resource, but it only works when people stand up for their rights and refuse to give everything away to academic journals that far to often try to lock things away.

March 15, 2005

BBC: Blogging 'a paedophile's dream'

Newsflash: Technology can be used for bad things too

Next on Persecution Today: Psychologist warns phone is evil, as it can be used by paedophile's to exchange information, printers should be banned because the can be used to print illegal photos.

Now, my question to the BBC is this: Where are the alternative views? What about trying to present a fair and balanced view, instead of resorting to sensationalist propaganda

It's official: The EU is a banana republic

The article European Union is a banana republic points out that thanks to the Software patents debacle one of the first results if you search for "banana republic" on Google is now a EU website.

Let's make it worse. Link to the EU banana republic.


Tracking GPL Violations

The gpl-violations.org project is just what it says - a project to track and address violations of the GNU General Public License

The Long Tail of Recombinant Components - Bridging the Semantic Gap

In Manageability - The Long Tail of Recombinant Components, Carlos E. Perez expands on the previously discussed article
The Long tail of Software. Millions of Markets of Dozens by addressing how this relates to recombinant computing - or the ability to assemble an application from configurable components.

(See also Strong Signals: The Recombinant Corporation by John Parkinson for a higher level look at recombinant components.)

The idea of recombinant components is more that just about being able to configure components and tie them together - it is about being able to adapt and reuse software without engineering in the traditional sense.

One of the key problems that needs to be solved in order to allow proper interaction between disparate components is semantic disparities and the vocabularies of they use to communicate. Well defined ontologies will likely pay a key role, and as such the Semantic Web technology is likely to pay an important role by bridging the semantic gap between components created in different environments.

This semantic gap is present in many areas of software engineering - different components operate at different levels of complexity, or make different assumptions about the how a system should work.

Even components that operate on the same level of complexity will often end up using disparate technologies to communicate with the outside world, or different vocabularies if they happen to share the same external represenation (such as XML).

To allow a user to reconfigure an application, you can't rely on components where "glue" code has to be written in a programming language to modify how the application works. Even basic scripting often proves a significant barrier, and is brittle in the face of evolving software.

OWL - the Web Ontology Language, a component of the Semantic Web, may prove to be an enabling technology for recombinant components by allowing software to reason about software - provide facts about the software in OWN and RDF, and some degree of bridging the Semantic Gap can be automated, provided that components are built with it in mind.

Some work has already been done to represent UML in rdf (see also A Discussion of the Relationship Between RDF-Schema and UML) allowing UML diagrams describing parts of a component to be converted to a form where it can be reasoned about automatically.

With components being well documented with UML, the UML being represented as RDF, and the appropriate RDF or OWL schemas, one would be approaching a situation where one could imagine a visual interface for recombining components based on automatically generated "bridges" between disparate vocabularies, where the software would be able to give guidance as to what connections makes sense from a semantic perspective.

I'll probably revisit this subject once I've spent some more time reading up on the semantic web.

Newspapers from 1817 to 1930 searchable online

Related to yesterdays entry on paywalls I thought it worth pointing to The Scotsman Digital Archive

The Scotsman has made every copy from 1817 to 1930 available online. The archive is NOT free - it costs from £7.95 for a 24 hour pass to £159.95 for a one year pass. However this article on the creation of the archive is a worthwhile read.

The archive will apparently also be extended up to 1950 come April this year.

So related to the discussions on placing content behind a paywall - what about content that was previously mostly unavailable?

The Scotsman archives, as most archives of printed paper, have previously only had very limited availability due to the physical limitations of access, so even if it costs money, this is arguably a significant step forwards - a valuable resource for historians and genealogists for instance.

Archives this old are perhaps unlikely to achieve significant advertising revenue, so one can understand why they attempt to charge.

But would the long tail be applicable? I.e., can you expect even the more obscure content to eventually be of interest to someone, and would that make advertising supported free access better long term combined with the increased exposure free access would create?

While pondering those questions, take a look at the first ever issue of the Scotsman from January 25th 1817. If the content doesn't interest you, take a look for the user interface.

Exploitation through job offers

The BBC reports that the UK's NHS is 'taking away Africa's medics'.

While these are people who voluntarily come to the UK seeking a better life, the article raises some serious questions:

- The BBC claims the doctors and nurses who have migrated to Britain cost the African nations £270 million to train.
- They also claim Britain has saved ten times that by not having to train it's own doctors and nurses.
- Yet over the last 5 years Britain has donated £560 million towards healthcare in Africa.

So while money is being donated to healthcare, Britain is massively increasing the cost of filling positions in the healthcare system for the same groups of countries.

While I support immigration and strongly believe that the levels of asylum seekers most developed countries accept are pitiful, at the same time it's worth asking the question of whether the right thing to do - not for the individuals wanting to come, but for their home countries - would be to make it harder for certain groups of people that are purely economic migrants to come to work.

One approach might be to give visa's for limited time periods only, or to require a certain number of years of service in their home countries first.

The value of such restrictions might quickly add up to more than current aid levels. Yes, it will cost money in the form of higher education costs to make up for lack of migrant workers, but that is money that will eventually need to be invested anyway for any country that is serious about combatting poverty.

The current state of relying on migrant workers is based on an expectation of continued poverty, as that is the main driving force for people to leave these countries.

What good is aid if you take it back with the other hand?

Type composition in static languages

Over the years, one of the things that have kept fascinating me is the subject of runtime type composition. In other words: Combining a complete type from separate implementations of interfaces.

Or more specifically, how to do this in an efficient way in a statically typed language, such as C++. Languages such as Smalltalk and ECMAscript / Javascript supports this easily, but apart from syntactic ease of composition, their approaches does not do much better than what you can reasonably easily implement in C++.

One article always worth mentioning is Protocol Extension: A Technique for Structuring Large Extensible Software-Systems by Dr. Michael Franz, who outlines an experimental approach he deviced for Oberon.

Warning: Long rant following... Lots of implementation details glossed over :-)

Quick and dirty type composition in C++

A basic approach to type composition, or protocol extension in C++ is something like the code below, which mirrors an approach used by some component systems:

#include <map> #include <string> #include <iostream>

// ---- Generic stuff --------------

class InterfaceBase { public: virtual ~InterfaceBase() { } };

class Object { typedef std::map<std::string,InterfaceBase *> IfaceMap; IfaceMap ifaces_; protected: template<typename T> void addImpl(T * ob) { ifaces_.insert(std::make_pair(typeid(T).name(),ob)); }

public:

template<typename T> operator T *() { IfaceMap::const_iterator i = ifaces_.find(typeid(T).name()); if (i == ifaces_.end()) return 0; return dynamic_cast<T *>(i->second); }

};

// --------- Interface declaration

class FooInterface : public InterfaceBase { public: virtual void foo() = 0; };

// ---------- Specific interface implementations

class FooImpl : public FooInterface { public: void foo() { std::cout << "foo" << std::endl; } };

class FooObject : public Object { public: FooObject() { addImpl<FooInterface>(new FooImpl()); } };

int main() { FooObject ob; FooInterface * foo = ob; foo->foo(); }

The downside of this should be fairly obvious: It is painfully wasteful for small "objects". It's not really type composition at all, merely some syntactic tricks to make the fact that we're dealing with a container of disparate objects.

A way to reduce the wastage for the above approach might be to make sure we only allocate one block of memory only. It's not particularly difficult - we'll need a factor which will need to know which interface implementations to use, and can determine the binary layout of the object, and then create a mapping table, much like the vtable used for virtual methods.

The factory can then allocate the memory and use placement new to create the objects.

The obvious issue with this method is that if you use malloc to create this object, it can't be deleted with the normal delete operator, and if you use the array version of new you have to remember to use the array version of delete.

One workaround is to always wrap them in a smart pointer that knows how to deal with them, or use a garbage collector - otherwise you have to explicitly call a function that knows how to deal with the composed objects.

A better approach is this:

template<int Size> class Object { protected: char data[Size] buffer;

// ... the rest... };

So instead of inheriting from Object the class, new types will inherit from Object<SomeCalculatedSize> the template instantiation.

The next refinement here is probably to use typelists with a lot of template magic in order to simplify the creation of the composed types.

Refining the beast

But it's still inefficient: It contains a list of pointers to interface implementations in each object. And it's also cumbersome to use: You can cast TO an interface, but you can't cast FROM an interface to the base object.

The first problem is "easy" to take care of by implementing our own "vtable". Instead of containing just a flat buffer, each Object contains a pointer to an array. Each type implementation can initialize each new object with a pointer to a type specific vtable that gives the offsets from the start of the object to the specific interface implementation.

The second is more problematic. One way of doing this is to put a pointer back to the base of the object at a fixed offset from each interface implementation, but that means wasting an additional pointer for each implemented interface.

Another approach is to use a "fat pointer" - a smart pointer that contains extra state, for instance a pointer to the base object. This saves us extra pointers in the object itself, but double the size of our "pointers", which could be really painful in some applications.

A third approach is to use a specialized smart pointer for each interface that will "cast" an internal pointer to the object to the specific interface implementation pointer on each call. This saves memory at the cost of at best an extra table lookup for each call. This is more or less the approach chosen for the Protocol Extension paper referenced above. However in that case the table contained all methods, not interface implementation types.

A table lookup is only feasible if the number of interfaces used is reasonably small and you can accept allocating a table large enough for all interfaces in the system for each object, though. If not, you'll incure the cost of a hash table lookup instead.

None of these options are particularly attractive. One possibility is supporting more than one of them: Allow casting to a "cheap" smart pointer but not back, and provide a fat pointer alternative if you need to be able to cast both ways.

In the end this probably boils down to application specific needs. But my quest for a method that's simple AND fast enough for general use continues...

The simple joys of office life

We've been in the new office for a week now, and we're getting settled in. Apart from being one of a small group of unlucky people who have yet to receive a new security badge (I hope I shouldn't read anything into that...) everything is fine. Great in fact.

Such simple things as having a soda machine instead of a fridge that never got refilled, or a hot drinks machine, and even an unlocked stationary cupboard within walking distance (!) are making it quite enjoyable. Not that the old office was bad, but the new one is definitively of a significantly higher standard.

And it's fascinating how stressfull such a simple thing as stationary can make office life when the stationary was stored in an office that would be locked on a regular basis and you felt the piercing eyes of the facilities manage judging you if you took a pen too much.

Though our desks are smaller in the new office, CRT's being replaced with LCD's helps that, and despite the desk since most of us still have more privacy and storage space here.

The one downside: It takes several minutes to walk to reception or to the nearest quiet rooms... A well, I'll consider it excercise.

California judge backs gay unions

According to the BBC, California judge backs gay unions based on the argument that the state constitution prohibit discrimination.

I still have problems understanding why this is such a big problem for so many people. I respect that some people find the idea of gay marriages offensive, but if we are to prohibit every thing that anyone finds morally objectionable or offensive, we might as well give up on civilisation right away.

March 14, 2005

XML with C++ - To DOM or not to DOM

After my rant earlier today, I've decided to toy around with writing my own C++ DOM parser, mostly as an educational experience, but perhaps it'll turn out to be useful.

My initial thought is to leverage boost::shared_ptr to get an interface that is as close to the Java DOM mappings as possible while getting a proper modern C++ look and feel to it.

One thing is bothering me, though: More and more W3 initiatives have defined their own DOM mappings, such as SVG. In order to write a truly high quality C++ DOM implementation, it needs to be reasonably feasible to layer implementations of these specs "on top" of the core DOM.

One solution might be to make the factory methods configurable and write most of the classes as templates that can be combined to create a uniform class hierarchy, but I'm unsure if it's worth the effort, and how practical it will be...

I think I'll be spending more time reading up on design patterns and principles to get good ideas on how to solve this properly than on actually writing the DOM behaviour, as that is actually very simple.

Fedora Core Linux Blog

If you use Fedora Core, then Fedora Core Linux Blog is a great blog to follow.


Ajax, promise or hype

QuirksBlog: Ajax, promise or hype? is a great summary of AJAX, the discussion around it, some history, and some suggestions.

(If you've missed it, AJAX stands for Asynchroneous JAvascript + XML. It's a paradigm for creating web applications by leveraging the support for sending HTTP requests and retrieving XML via Javascript, and for styling and processing XML/HTML via the DOM in modern browers.)

UMBC Semantic Web Reference Card

Via: Danny Ayers, Raw Blog:

"The UMBC Semantic Web Reference Card is a handy "cheat sheet" for semantic web developers and programmers. It can be printed double sided on one sheet of paper and tri-folded. The card lists common RDF/RDFS/OWL classes and properties, popular namespaces and terms, XML datatypes, reserved terms, grammars and examples for encodings, etc."

Communities of purpose

Nova Spivack has a short blurb titled Minding the Planet: Communities of Purpose: The Third Type of Community. He mentions wanting to create tools to help people create more productive communities of purpose.

The question then, is what should such a tool support? The problem is that the answer is "it depends". It all depends on the purpose, which I guess is why most existing tools focus on what Spivack calls "communities of interest"
and "communities of practice".

It's "easy" to create a generic tool like Yahoo! Groups that supports generic groups with discussions.

However once you move to communities of purpose, tools are likely to need to be more specialized. Consider Groklaw, and how it could benefit from tools allowing better support for managing the plethora of legal documents they host.

SourceForge
is perhaps the best example of a tool designed for a community of purpose that I know of. It's a tool designed to help support the purpose of managing software projects, allowing you to easily create a custom community around your project.

The problem of course is that it's a niche tool, and while SourceForge's niche is quite large, it's easy to imagine a huge number of purposes that would benefit from specialized tools, but that are too narrow niches to warrant building a tool from scratch.

How do you blend existing types of tools to create something that is easy enough to customize without software engineering experience?

Free ride vs. 'pay-wall'

Dan Gillmor and Doc Searls have both commented on the New York Times article
Can Papers End the Free Ride Online?

Both comment on how putting the content behind a "pay-wall" or "costwall" reduces their visibility on the net, thereby reducing their visibility to people who increasingly stick to online sources.

I'm one of them. I don't buy newspapers anymore. I used to read the paper cover to cover when I was a kid. Once I got online, my number of sources massively broadened, and my total news consumption increase, but at the same time I stopped reading paper copies of the newspaper.

However I'm probably spending more time looking at pages with advertising now than I did then.

I've long stayed away from New York Times because of their annoying registration. It's not that I mind registration in general - I registered at Slashdot and many other sites I frequent regularly. However keeping track of usernames and passwords at sites I visit only every once in a while, is annoying, especially when it's only to read content, as opposed to participate in a discussion.

The New York Times simply isn't compelling enough, so instead I'll happily use BugMeNot.com.

The New York Times is perhaps the best example for this, to the extent where it's even presented as an example on bugmenot.com's main page.

The moment they require payment, I'll stay away permanently - the number of news sources available to me makes news a cheap commodity, and I don't value their opinion pieces and other unique content enough to pay.

Fewer eyeballs for their ads, less presence in search engines, less relevance, in a time where staying relevant and visible makes the difference between setting agenda and being ignored.

It's bizarre then, that this comes at a time when New York Times has become blog friendly, letting anyone
create a registration free link
to refer to them from their blogs.

Suddenly they became more palatable to blogs that previously have shunned links to their site, and it seemed they'd started to get it.

Let's hope the current article is just talk.

Patent law for sale

If you liked the European Anti-Software Patent Bribe Pledge Drive, then Muriel's blog has the scoop for you...

Why REST is Better

Manageability - Why REST is Better - Part 1 - Explained in Code is part 1 in a series of (so far?) 5 articles over at Manageability.org (this blog is highly recommended if you're interested in software design).

It covers REST in detail, with lots of comparisons to SOAP, and provides plenty of useful advice on how to structure your REST API's.

SOAP and simplicity - why not just REST?

Sean McGrath's article 'Behind the firewall, nobody can hear you scream.' provides a great summary of some of the issues with SOAP

Personally, my main griped with SOAP (and with XML-RPC but to a much lesser extent), is that it's simply massively over-engineered for most purposes.

99% of the time, if I need to connect two systems of a network, and particularly over the internet, the amount of data is either very small, and it's more flexible to handle it "manually" so you don't depend on clients that you often don't control to be able to talk SOAP properly, or the amount of data is very large, and wrapping it in SOAP makes it prohibitively much larger.

Why is it that if you put a bunch of engineers in a room together and tell them to write a spec, almost inevitably we seem to come up with stuff that they'd probably preferred not to have to use if someone else had presented them with an implementation?

C++ XML Parser rant

I've been toying with Atom over the weekend. Or should I say I've been frustrating over the quality (or lack of) C++ XML parsers.

We have the C++ version of Xerces, but I've never liked it. Probably because it's refusal to use namespaces and exceptions, and the cumbersome (and slow!) string handling as a result of the way it handled i18n makes it painful to work with, and the new API completely reverts to an old C++ style full of pointers instead of the RAII idiom or similar more modern techniques such as the Boost smart pointers.

libxml is very featureful, but plain C, and the C++ wrappers available leave much to be desired. It also doesn't provide a proper DOM interface, but an approximation that I find painful to use (because in many places diverges from the W3 DOM for no good reason other than legacy). Documentation is also an issue - I've more than once resorted to the header files while looking for something that should have been trivial.

For SAX work Expat is a tolerable alternative, and there are some C++ wrappers out there. But
it doesn't provide a DOM.

One possible solution to that might be Arabica which looks very promising as a DOM layer on top of Expat or libxml, but so far I haven't gotten it to compile at all, and haven't had much time to spend on it...

WHY is this so hard? I feel tempted to write a parser myself, or at least a DOM implementation - I've written parts of one before and also spent quite some time reading the Xerces source. I don't ask for much - level 1 Core and some minor parts of level 2 is all I ask for...

March 13, 2005

TV Listings via RSS/Atom?

I want my TV listings via RSS or Atom. Atom would be great because it allows arbitrary XML to be inserted, so you could add RDF triples to add machine readable versions of most of the information.

It would in particular be interesting to allow easy publication of customised schedules for fan sites etc.

Maybe I'll have to hack something together based on XML-TV

Now that I think of it, this would work great for things like playlists as well...

The long tail of software

I've just finished reading Bnoopy: The long tail of software. Millions of Markets of Dozens. and I highly recommend it.

The idea of a long tail is not new, but was popularized by Chris Anderson of Wired. The concept contrasts to the traditional 80/20 principle, and essentially refers to the countless situations where 80% of the benefit can not be achieved by targetting 20% of the "problem" or market.

The Bnoopy article uses Excite as the main example. But
Chris Anderson's blog The Long Tail has another interesting article on the long tail at Google (as well as many other interesting aspects of the Long Tail phenomenon).

Another good article is this Tech Central Station article called 'Chasing the Long Tail'

Long Tail Marketing is a blog devoted to the concept of marketing to the tail.

at Technorati.

Do we need the Semantic Web?

ZDNet UK takes a brief look at the current status of the Semantic Web in this article.

I'm currently preparing an essay on the Semantic Web as part of my MSc. studies with Open University, so it's a subject I'm particularly interesting in, and in some ways it was actually my interest in the Semantic Web that finally got me to set up a blog. Why?

Because bloggers are among the first to actively embrace elements of the Semantic Web, such as RDF used to varying extents in RSS and Atom.

As such, we do need the Semantic Web. Every time we link in an RSS/Atom feed to a page, or adds a FOAF profile to a page, we're building the Semantic Web.

Longer term, the work on ontologies and schema languages such as RDF Schema and OWL promises to bring a lot more advanced features, such as reasoning about data to allow agents to "understand" new data types and integrate data types using new vocabularies automatically into their data models.

However, short term, whenever the scripts you use to maintain your blog autodiscovers trackback info, or you fire up your RSS reader, you are seeing the early fruits.

As for benefits, consider this: Bloggers are currently dealing with older RSS, RSS 2 and Atom, all of which have different vocabularies.

One of the most immediate benefits of the Semantic Web is unifying vocabularies. Instead of asking for the author of a feed, you'll use a library to query for a specific concept so that when somebody releases YetAnotherFeedFormat v1, and it's RDF linking to an OWL schema, your software would automatically know that your "author" concept matches YetAnotherFeedFormat's "SomeDudeWhoWroteAnEntry" tag, except that the latter also includes "CoolNickName", and would be able to use the data from the new format without having to release a new version.


Text banking trialled in Kenya

It has always fascinated me how current technology in many cases allow developing nations to leapfrog costly stages of the development that the industrialized countries went through.

The Inquirer has an article about how text messaging is being tested for banking in Kenya via a pilot headed up by Vodafone. Whats particularly interesting is that the pilot bypasses banks, by using credit on top up cards for the phones for the payments.

This follows a trend where developing countries including Nigeria and Marocco are now getting 3G in many cases just a few short years after getting cellphone service at all, and where cellphones already outnumber landlines in many countries because installing base stations is actually significantly cheaper than laying cable to each subscriber, especially in countries where salaries are low enough to make stealing the copper wire used for phone lines attractive (search for 'copper telephone').

(Inquirer article found via: textually.org: Text banking trialled in Kenya)

Groundbreaking fund for life extension research reaches $1 million

Via FuturePundit: Methuselah Mouse Prize Fund Reaches $1 Million: The fund is dedicated to rewarding research into ago, specifically via two prices related to life extension in mice. Longer term the goal is to translate the research into therapies for humans to initially make it possible for humans to reach a 150 year lifespan in full health.

March 12, 2005

23 questions to the EU Council

In Letter to the Council: What happened on monday? FFII requests answers to 23 questions to clarify what actually went on at the infamous council meeting that approved the software patents directive.


Mangle other peoples pages (legally of course)

Greasemonkey is heaven sent if you ever run across sites with useful content but that are persistently "broken" or have usability problems. It lets you write small scripts to modify another sites web pages on the fly as you're visiting them.

A wide range of ready made scripts are available, including scripts to skin del.icio.us, add saved searches to Gmail, show link titles inline for Blogdex, remove assorted filler content from CNN, and much, much more.

Canopy out of SCO

According to this article at News.com some of the terms of the Ralph Yarro-SCO settlement includes Canopy handing over their SCO shares to Yarro.

Why he'd actually want them is anybodys guess, but it might seem as a face saving measure for both parties: Yarro gets to say he got something substantial, Canopy gets to get rid of a troublesome legal case.

Groklaw also covers this here.

Are your sources legit?

(Update: News.com has posted this article which summarises reactions to the ruling)

Dan Gillmor on Grassroots Journalism, Etc.: Apple's "Trade Secrets" is an interesting comment on the judgement Friday that three bloggers that wrote about Apple trade secrets may have to divulge their sources.

The key issues seems to be whether or not these bloggers could have gotten their information from someone who had legal rights to release the information.

While I understand Dan's concern - if it stands this judgement may very well have repercussions for journalists access to information - I also see why the judge is going in this direction.

Otherwise people who wants to destroy trademarks can do so only by leaking the information to a journalist, safe in the knowledge that the journalist can publish without repercussions, and the sources name won't need to be handed over.

The key here is how you know whether or not sources have a legitimate reason for getting the story out, and how you ensure legitimate sources are protected while gratuitous use of source protection to destroy a company's trade secrets can be prevented.

I share Dan's concern that companies might try to claim trade secret protection for just about everything. However what is important is that trade secret protection can't be successfully defended if the information is already public, or if the information is available from a source who legitimately does not have any restrictions on releasing it.

That means that you're mainly in trouble if you report on fact that has been handed you from an employee or partner who've signed an NDA, and who has been clearly breaking the law in handing you the information in the first place - If I started handing out information from my employeer it would have been a blatant violation of my employment terms.

While there are times when such sources SHOULD be protected, such as when whistleblowing on illegal actions by the company, journalists also needs to take some responsibility in their use of sources and ensure that they for instance don't end up as pawns in a game to cause damage for no good reason.

First mover advantage

In the business world you'll often hear about 'first mover advantage'. It has an almost mythical quality to it: Gain first mover advantage, and all will be well. Of course, everyone knows it's not sufficient for success, but it is seen as a way of massively increasing your chances.

But is that really so? There is an advantage in being early, before a market becomes entrenched, but even in that respect Virgin Group have demonstrated that you can often find significant lucrative opportunities exactly in a market that has become entrenched because the competitive fervour of a growing market may have vaned and left you with a few large players that know eachother and know well enough that they have little to gain by upsetting the balance. Virgin upsets the balance, and their strength is being an unknown factor when they join a new market.

A problem with being the first mover is that you have to develop the market. You may strike lucky - Hotmail did, and managed to get a lead that made it incredibly hard for anyone to approach them in terms of size. However, for every company that gets lucky this way, there are thousands more that have "first mover advantage" in their field but never get far because the market is too hard to develop, and the cost for competitors to enter after you've spent massive amounts of resources developing the market is low.

My previous company is one example of a company fighting to grow in a market that's not yet developed to where we hoped when we started out. The company originated as a personalised webmail service called Nameplanet (the service is now owned by an unrelated company), and morphed into the registry for the .name TLD. The problem? The market for personalised domain names has so far been far smaller than everyone initially hoped. I still believed in the market for personalised addresses, but it's becoming clear that the market will require significant work before it's mature enough, and the cost of competing with alternative, but similar, products is extremely small.

I've lost faith in first mover advantage. Instead I prefer to think about second mover advantage as what you should aim for: Pick a field with one, or a small number, of underfunded companies burning cash to develop the market. Quietly prepare your product, and jump in when the pool is sufficiently full. You'll have the advantage of not having spent all your money building a market that is easily taken away.

The alternative is being a niche player or the unknown quantity. Enter an established market where the entrenched players keep doing the same boring things. Do something new. That's what Google has tried to do with Gmail.

One mistake: Make sure your main advantages aren't easily copyable. In Google's case it was storage - after their beta launch, all their competitors quickly scrambled to provide massive storage increases, and while most of them still don't match 1GB free, they offer hundreds of MB and larger premium products. Gmail does have an interface that is compelling for some, and by virtue of being Google they will gain a significant userbase, but for a smaller company a mistake like this would have been disastrous. (Btw. in case you don't know, I work for Yahoo! and while I don't speak for my employer, you might want to take that into account when assessing my biases...)

I learned that lesson the hard way with my first company - NorConnect - which I co-founded and was a director of. Our main mistake was to think that we'd make a big splash by undercutting all the main players in the Norwegian ISP market at the time. Instead we, and 2-3 other new ISP's, sparked a price war that got all the major, well funded, players to slash their prices overnight with the result that almost no ISP's made a profit on access for the next 4-5 years. In our case we sold the dial up for a pittance and went on to do consulting instead.

So what would be the best way if you, as a small player, want to enter the mail market just as Google did? Go for something that is directly at odds with the model of the large players. Something they CAN'T offer without massively changing their business models.

If I were to go back into the mail game, what I would have done was to revert to the original Nameplanet model: Offer a personalised service for a fee, and build it into a comprehensive messaging platform. This also happens to be close to the model of the company I joined after NorConnect - Telepost Inc. - a unified messaging provider.

Where I'd change things would be in offering a service that emphasise all those things you CAN'T offer in a free product: Extensive support, additional reliability and accessibility including extensive ACCESS to backup and failover facilities to increase the perceived enterprise readiness, any unified messaging services that would drive the cost of providing the service up above a level sustainable for free users, extensive access to all your data etc.

In other words: I'd aim for a service competing for a segment of the e-mail services users that they can't compete for with their free offerings, and where any attempt to go after the premium market would require a different product. At the same time, this is a market that IS developed, and where there are some players offering premium products, so competition would be fierce.

That's where second mover advantage comes in: Next time you see a Hotmail or Amazon rapidly rising, look for the segment they can't take and go for it. Let them build the market, and take the fringes. If they succeed, the market will get big and you can take a slice of it that'll give you a solid foundation. If they fail, they'll still have helped you by spending tons of cash building the market for you. If you're lucky you could corner the whole thing.

Almost the same potential, and significantly less risk.

AIM TOS: 'You waive any right to privacy'

Thrashing Through Cyberspace: AOL Eavesdrops, Grants Itself Permission To Steal Your AIM Conversations is a pretty shocking entry covering changes in AOL's Terms Of Service for AIM.

AOL essentially claims all rights to whatever you send through AIM, including rights to publish. The new policy explicitly spells out "You waive any right to privacy. You waive any right to inspect or approve uses of the Content or to be compensated for any such uses."

So when will the first book with people's private conversations come out? Even if it seems pretty unlikely that AOL will do something like that, this new policy does give them the right to.

It also explicitly deny you the right to any compensation or to approve such uses upfront.

March 11, 2005

Combining photos with maps

next stop: semantic metaverse � Mapping flickr Photos discusses Mappr - an application that overlays flickr photos on a map.

I love the concept - while surfing the net it's always struck me that things like travel descriptions, isolated pictures etc. feel disconnected. I'm a very visual person, and can often "picture" large interconnected systems, but picturing the web is hard - structural relationships are not obvious.

Mapping data found online to geographical locations is a great way of creating a visual relationship that makes the structure easier to picture. And photos are ideal for this kind of application because photos are "connected" to the geographical locations - knowing where they were taken will help us fill in our mental image of what context they belong in.

While you're at it, take a look at the rest of next stop: semantic metaverse too.


Rat in a maze

This week has been strange. It's gone by as in a daze. It's partly the new office - new routines, new locations, builder swarming around us desperately trying to finish up the last bits and pieces. It's partly been the change in my commute, an extra change and a new few minutes walk from the station to work, through much busier parts of London, trying to avoid being run down by buses and taxis.

I think I like it. I love coming in to Charing Cross in the morning, the sights along the train tracks between London Bridge and Charing Cross are ... interesting. The tracks are raised above street level, and you get to see all the quirky warehouse conversions mixed with an eclectic collection of century old buildings while you seem to float between them almost at roof level.

I enjoy the walk up past Trafalgar Square and St. Martin-in-the-fields, past the theaters and the bustle that we lacked near Victoria Station.

And I've already made use of the proximity to china town once to pick up dim sum from my favorite Chinese restaurant, New World (a must if you visit London and love dim sum, though expect shoddy service).

All I'm waiting for now is the days to get longer... It still doesn't feel like spring.

Innovation through inaccurate emulation

On the train home today I suddenly for no apparent reason started thinking about a subject that have interested me for a long time: The idea of innovation as a side effect of inaccurately emulating something else.

If you've ever looked at a picture of some device and imagined how it works and then been bitterly disappointed once you got a chance to try the real thing, then you've experiened this phenomenon.

For me, it's such a frequent occurence that if I want to get ideas for improving something I'm working on I will not under any circumstance try out similar products or software before I've spent a lot of time looking at screenshots and brief product descriptions and try to imagine what the software does.

I find it fascinating for another reason too - it demonstrates how extremely hard it is to create user interfaces that are "obvious" - if how the interface works was obvious from a screenshot, I'd be "stuck". It's when it's not clear how something works that the room for coming up with innovative ideas about how it ought to work is the greates.

Better Bad News satirises Google AdLink

BetterBadNews has a funny take on Google's controversial AdLink.


Link love

Burningbird � Guys Don’t Link is an interesting (and hilarious) article on male bloggers obsession with links ('Mags, are you telling me that guys equate links with their dicks?'...).

Personally, though, I think a large reason why the perceived "old-boys" network of "alpha bloggers" is so male dominated has a lot to do with the traditional gender division when it comes to technology usage - the large blogs are almost all dominated by geeks or at least tech savvy people who's been involved with technology for a long time.

That alone will heavily skew things.

That said, I do think that there's a lot to be said for some of the more, ahem, creative explanations as well... Some blog because they love to write, and will write long articles with few links.

Some blogs for publicity and traffic, and they'll focus on linking because linking to other peoples content lets you produce a perceived high value with less work.

But if you DO write long articles, you still owe it to yourself to get it out there, and adding outbound links to interesting material directly connected to what you write is a great way of getting people to link to you, and also of getting people to discover you (the power of the referrer log is great...)

'Pluto could have been a strung-out heroin addict'

From Wired News: Furthermore archive (look for the headline Disarmed Robbery)


Magpie: PHP RSS/Atom parser

Came across Magpie today, and now I know what I'm going to spend the weekend playing with... (apart from painting and flooring the rest of the living room.. sigh..)

Fired for blogging?

Anil Dash: Nobody has ever been fired for blogging provides a good summary of the recent debate over whether or not blogging is a risk or an advantage for your career.

My own take? Assume everything you write in your blog will make it back to your boss, and that your boss will know the comment was made "in public". What would you say in other public contexts where you know your boss would hear about it?


Most companies have strict guidelines about what staff can say to the press about the company (often nothing unless cleared with PR first).

While the typical blog still have a significantly smaller readership than most established newssources, the same principle to a large extent applies - if you say something on your blog about your workplace you wouldn't dare say to a journalist or shout at the top of your lungs in public regardless of who was present to overhear it, then you should think twice.

This is no different (though the consequences may be) than giving proper regard to what you say about any other subject: If you wouldn't be prepared to take the consequences of saying something in public offline, why would you think you could evade the consequences if you say it online?

Now I finally know the right ways of getting rid of people...

Slacker Manager: The unspoken language of... gives you all the advice you'll ever need (and some that's likely to get people to snicker behind your back whenever you pass by) for ending unwanted meetings and getting rid of those annoying people who always try to get you to do something useful...

Sinn Fein punished without a trial

Sinn Fein MPs to lose allowances after the UK House of Commons vote on a motion resulting from the claim that the IRA was involved in the robbery of the Northern Bank in Belfast.

Am I the only one that find it worrying that UK police and MP's are prepared to assume IRA and by extension Sinn Fein's guilt without a trial?

And worse, the House of Commons is financially punishing a competing political party based only on a report that they have had no chance to defend itself against for actions alledgedly carried out by members of an organisation they have ties to.

Since when did parliament turn into judge, jury and executioner? So much for separation of powers

If the world was run like EBay the pimp would be redundant...

...according to the BBC. An interesting overview of loan mediator Zopa that also happen to mention Escort mediator Adultwork (sexual imagery and descriptions abound - enter at your own risk)

March 10, 2005

Why we love Wil

In his latest post Wil Wheaton demonstrate why his blog is constantly one of the most worthwhile to read: He doesn't try to play it cool and act like he's seen it all, so when thing works out for him - like in this New York Times article (registration required), he gets just as giddy as the rest of us would have been over what is a thousand words of praise in one of the most prestigious newspapers in the world.

Go Wil!

Screw you Bill!

Study Says UK Workers Stressed by Deceitful Office Culture and Microsoft tries to use the study to sell software...

Apparently Microsoft software will make information flow more efficient, thereby increasing the honesty of staff. An obvious conclusion!

One might be tempted to conclude that perhaps Microsoft should let Bill and Microsofts PR staff more Microsoft software since they seem to be a bit lacking...

Funny, though, as the article does point out, that it was a culture of deceit in US companies that gave us marvellous companies such as Enron and Worldcom.

I don't have a problem with the idea that many employers - whether in the UK or the US - feel compelled to lie to hide problems. However it just gets sad when someone tries to use it as a way of selling in software, when the main problem is not access to information but a culture where acknowledging problems with a project in many companies are considered a sign of weak management. What do you do when you deal with a company where people get terrified at the prospect of a risk and issues log for example and insists there are no risk?

Lenz on EU Council voting

If you're one of the people who're confused about the bizarre way the EU Council finally approved the software patents directive, you owe yourself to read Aussi Les Autres by Karl-Friedrich Lenz.

Click Fraud: Problem and Paranoia

Wired's Adam L. Penenberg is exploring the problem of click fraud (fraudulent click throughs on pay per click programs):

'Last week, I served on the "Click Fraud: Problem or Paranoia" panel at the Search Engine Strategies conference in New York. At one point, Jessie Stricchiola, one of my fellow panelists, tried to gauge the extent of the problem by asking the 80 people in attendance to raise their hands if they had ever been victims of "click fraud."

About half of the audience members, most of them small businesses owners, raised their hands. '

My automated TV scheduler...

... is my fiancee. I've always hated reading the TV programme to find out if something I want to watch is on, so generally I'll forget, or just turn on and zap or look through a small selection of channels such as Sky One that tend to deliver a healthy mix of Star Trek and Simpsons 50% of the time...

But whenever my fiancee is at home I know I don't have to bother - she'll without fail already have programmed reminder for all the shows that are worthwhile to watch that evening.

I'll have to make sure we stay together forever.

R. Robot: The first self-writing weblog

Why write your own blog when you can write a script to do it for you

GNOME 2.10 released

Gnome 2.10 is out.
Release notes here.

Highlights include file manager UI improvements, more background images and patterns included, the Totem video player, the Sound Juicer CD ripper, new version of Evolution with better offline and calendar support, and lots more

The myths around Europes high taxes

According to the Christian Science Monitor, a flat-tax movement stirs Europe.

The article makes much of the idea of how high taxes are in Europe. However, while the tax levels are in many cases higher than in the US the gap is generally smaller than people believe it to be.

The article also presents a lot of the arguments used for a flat tax rate, but leave out most of the usual counter arguments.

I'll briefly discuss both of these points.

Personal taxation in the UK and US

I'll compare the US and UK levels since the UK level is the European nation I have direct experience with. The UK is generally seen as having less of a tax burden than countries like France and Germany, or the Scandinavian countries. However my own experience from Norway is that while there are some differences, you need to earn well above average to see much of the difference.

The average salary for men in full employment in Great Britain in 2003 was ca. 525 GBP a week, or ca. 27300 a year which using todays exchange rate translates to roughly $52500. The median salary for men in full employment in the US in 2003 was $40668.

So let's look at the tax burden of people earning $40k and $50k respectively. Let's also look at someone earning 100.000 GBP, or roughly $200k. In the UK, that correspond to around 1% of the population.

In the UK, income tax for these three levels can be calculated based on this table. I've used the numbers for the 2003-2004 tax year.

For someone earning the equivalent of $40k, the tax would be ca. 3300 GBP, or less than 16% of their earnings. At $50k, the tax would be ca. 4700, or ca. 17% of their earnings.

Even our rich guy - the GBP 100k/$200k earner, ends up paying "only" about 33% tax. The UK is listed in the 40%-49% bracket in the CS Monitor article because the top tax bracket is 40%, but that is only in income in excess of around UKP 35.000 (ca $70k), well above the average income.

Add to this an average council tax bill of around UKP 1200 per household, and the effective tax rate rise to around 22% if our $40k and $50k earners are the sole earners in their household, and around 34% for the $200k earner.

How does the US measure up? Lets start with the federal rates. At $40k federal tax is ca. 16.8%. At $50k about 18.5%, and at $200k about 26.6%.

So in other words, you need to earn well above average before you start seeing the differences based on UK income tax vs. US federal tax.

Let's estimate some state taxes... That's harder, but most of the highly populated states operate with several relatively low brackets bringing you to 3-5% easily with the income levels we're looking at. Some, like California, can approach 9% for high earners.

Assuming 3%, it would bring our guys in at slightly below 20%, 21.5% and 29.6%. The gap to the UK income tax + council tax is low. Given that the council tax is per household, it would be even lower in a household with two or mor earners (UK council tax is also generally half for households with only one person).

What about local taxation in the US? This table claims that state and local taxation for most states will be around 10% of income.

Taking that into account, our three guys would end up with 26.8%, 28.5% and 36.6%...

Now who has high taxes?

Note that this comparison doesn't take into account health insurance vs. UK National Insurance. The latter is compulsory, and charged together with your tax. Adding in national insurance would bring the total UK personal taxation levels up to approximately the same levels as the US, but with significant added services (free health service and some access to free dental services)

It also does not take into account the differences in sales tax / VAT. UK VAT is low by European levels, at 17.5%, but high by US levels. I have on purpose not included this, as "ordinary people" tend to talk about the personal tax rates when complaining about tax levels and including VAT / sales tax directly would also require you to consider the differences in purchasing power and price levels.

So why do people think that taxation in Europe is so high?

Personally, as a quite well paid engineering manager, my TOTAL tax burden (including local taxes and national insurance) in the UK is around 32%, and with full health insurance on top of what NHS provides I'd pay an extra 1% max.

My answer is because of articles like the CS Monitor article that compares the top rate for the highest tax bands, ignoring that almost nobody pays those rates for more than a small part of their income.

In other words: Europes taxation levels looks high because we for the most part have a highly progressive tax system.

Which brings us to the next part...

The arguments AGAINST flat taxation

Much is made of how "everyone" gets to pay a "low" rate of 19% income tax in Slovakia. But as we've seen, in a country with an extensive health system such as the UK, the tax rate for average earners, who earn FAR MORE than the average earner in Slovakia isn't significantly higher.

What happens with a flat rate is that the vast majority end up paying higher rates than what they would in a progressive system, as a TINY minority gets tax reductions.

The typical argument for this system is that it stimulates investment. However, that is baloney. It stimulates migration of investment from places with higher tax rates for the rich. Now, for a nation such as Slovakia, that is fine - they need the investment. The overall end result for them might be positive even if low income earners pay higher tax as a result.

However it is not a sustainable way of attracting investment, as if enough capital is drawn away from other sources, they'll find way to give tax breaks to prevent the capital drain.

If capital remains in the country, there are few reasons to believe that lower tax significantly increases investment activity - in a high tax envirnonment, on the contrary, you need to be significantly more aggressive to see the same net return and achieve the same increase in wealth. Investors won't lean back and say "ok, we've made enough money now, because you work harder we'll just be taxed more".

This is even more apparent in countries like Norway, which taxes not only income, but also personal wealth, so that if you don't put your money to use, the combination of inflation and taxation will eat away at your capital.

Add to that a strong sense that progressive taxation is more just - a 20% tax rate for someone just making ends meet really affects their standard of living - a 35% tax rate for someone making GBP 100k is peanuts. The tax burden is already by far heaviest for people who make little to start with.

These are a couple of reasons why, contrary to what the CS Monitor might claim, we hardly hear about flat tax here. Even the right wing parties don't usually suggest it in western Europe at least.

Wil Wheaton on CSI

If you're in the US, you've got a chance to see Wil as a crazy homeless guy in CSI: WIL WHEATON dot NET: Where is my mind?: check local listings

I've never followed CSI, but it would be fun seeing Wil in a new role, so I guess I should figure out when it'll be shown here in the UK. Especially after reading his entries about the filming.

March 09, 2005

Christian Engstrom: Which minister is lying?

From FFII: Bendt Bentdsen (Danish Minister of Finance, Trade and Industry) or Laurens Jan Brinkhorst (Dutch Minister of Economic Affairs). Got to love that page.

Look here for some context

Don't want people to link to you? Then don't make your page publicly available

Lawrence Lessig pointed out on his blog the other day about how the syndicators of a Bill O`Reilly column complaining about "unauthorized linking".

So following Lessig's invitation, here is the offending link.

A suggestion to the syndicators: If you don't want people to post "unauthorized" links, either password protect the page, or require people to view a page setting out terms first, or verify that the referrer field is set to an "authorized" source. It's not hard to do - lots of people have done it before you.

UK: No sunset clause on control orders

Tony Blair has just won a key vote in the House of Commons rejecting a sunset clause for the terrorism bill that would have seen it expire in November.

What does his insistence on not accepting a sunset clause tell us? That he thinks there's a significant chance he'd be unable to get the majority needed for an extension 8 months on...

Gives you a lot of confidence, now, doesn't it?

Update: BBC now has an article summarising the votes in House of Commons

Dan Gillmor: Terrorism and the Internet

Head over to Dan Gillmor's blog for info on his participation at the International Summit on Democracy, Security and Terrorism in Madrid.

Related to the same conference, Joi Ito points out that the organizers have restricted press access to the summit. Note: See update below

I hope more people see the irony of restricting press access to a meeting which, in its conference FAQ has quotes like "The Summit will rest on the idea that democratic government is the only legitimate —and still the only effective— way of fighting terrorism. Only freedom can save freedom, and the struggle against terrorism can only succees by the rule of law."

Did somebody forget to tell them that press access and open debate is one of the cornerstones of democratic society?

Update: Joi Ito has updated his blog to mention that apparently the organizers did not want to restrict journalist access but did so only because of logistical problems. Let's hope that's correct.

Get rich and famous...

by blogging at work. All you ever wanted to know about how to talk about your co-workers behind your back online without getting fired...

Now, why did I plaster my name all over my blog... I should have created a cool alter ego or something.

Algorithmic world building the Elite way

Ever since I realised how Elite worked, the concept of using algorithms seeded by pseudo-random number generator have fascinated me.

In Elite, the whole "universe" was formed by seeding a simple pseudo random number generator with a fixed value and following fixed steps to generate "galaxies", and then seeding the same generator with a number chosen per planet to generate a sequence of values that made up the planets characteristics.

This approach is in direct opposition to the traditional idea of designing levels. However it also differs from a similar branch, of which one early example is Sim City where a basic map is generated based on a seed (possibly provided by the user) and where the same map can be regenerated by the same seed.

Where Elite and Sim City differ is that in Sim City the number generator is used to create the basic state of the game, and other algorithms are then used to mutate that state as time passes.

What really makes Elite different is that the universe is 100% built via the random number generator: The map can be derived from the generator algorithm and a seed. The planets characteristics can be derived from the planets seeds. The trading prices can be derived from the planets characteristics plus a "random" adjustment recalculated on each hyperspace jump. The presence of police and other ships is all entirely based on "random" numbers.

Nothing really changes, it's just randomly perturbed, and no state needs to be stored.

Long after Elite stops being appealing as a game, this idea very much remains a major contribution that few games take heed of today.

One of my dreams is to one day have the time and the cash to take this idea to it's extreme: Due to limitations of the computers of it's day, Elite made limited use of the algorithmic generation of it's world. The planets had some descriptions and a technology level, for instance, but not much more.

It also stuck to just an algorithm that was made to look relatively random. However there is no reason to be limited to that. The main lesson of Elite is that a compelling game world can be created by hierarchically structuring algorithms that can a) be seeded by a value, and b) will generate the same sequence of values when given the same seed.

Another lesson can be drawn from Elite: The illusion of a dynamic universe can be achieved by perturbing the staticly generated values. In Elite the trading prices change. The changes are random. However there are still seeming logic to it, logic that's important to the game - planets with low technology levels have high prices on computers for instance. But this is all achieved by applying semi random changes to data generated based on the planets techology level and seed.

However the prices will never keep diverging further from the static values

This can be taken further. Things like scripted storylines in games can be combined with seeming full freedom without forcing a gamers hand. Many scripted games will force the storyline through by making it impossible to deviate, creating what seems like arbitrary limitations that quickly make the game boring.

An alternative is to establish algorithms that set the parameters of the world where one of the argument to the algorithm is the time, but allow a "fudge factor". Say a war is waging between two factions, and one is meant to win for the story to move forward. If a gamer starts interfering, something needs to be done.

The "Elite like" approach could be to apply random perturbations to the forces relative strengths and to political and economical events that could affect the outcome, but with the addition that the result from the number generator gets weighted more towards the opponent the more a gamer upsets the balance.

This DOES allow a player to upset the balance temporarily, but the longer and more the player interferes, the harder it will become by virtue of the weighting - suddenly the opponent will get reinforcement, nearby governments will get overthrown by movements supporting the opponent etc.

A lot of events that are NOT scripted may have to be generated "randomly", but due simply to the complexity of the scenario a "random" sequence of events will often be interpreted by humans as having more meaning than it does, and will seem more natural than a situation where one sides forces keep losing no matter what the relative strengths are to start with.

In other words: Script only what you absolutely need, leave the rest to a pure, side effect free algorithm combined with random perturbations.

Incidentally this is what Elite does with its missions: Once you meet the criteria, only a few triggers are present to decide whether or not you get a mission and whether or not you meet the ships you're sent after - everything else is left to chance, and given your incomplete information it looks like you're on a search while you're aimlessly moving about waiting for the number generator to decide it's your time...

More later...

Brave New World

Blair defiant on terror measures despite massive opposition in the House of Lords.

It is beyond me how Blair can stomach to introduce what in effect is a return to the days when the government could lock someone up without evidence.

The very fact that these "control orders" are intended for use in cases where the government can be reasonably sure NOT to be able to secure convictions should be a big red flag.

Now, there's a lot to be said for having the ability to track or restrict the movement of people where there is a significant risk that these people are involved in terrorism activity, however there is a huge difference between allowing the Home Secretary to unilaterally decide to lock someone up (which stinks of fascism) and allowing the government to secure such "control orders" via a judge after having provided a reasonable level of documentation and allowing a security cleared barrister to argue on the persons behalf.

The scariest aspect of this is how the terrorism threat has been pushed forward to the front of peoples minds even when a tenfold increase in terrorism over the levels of the last 5 years would still make terrorism a trivial risk to peoples lives and safety compared to a wide number of other issues, such as road accidents.

This is particularly bizarre in the UK, where IRA terrorism was a fact of life for decades - I worked in London and regularly passed not far away from the BBC building around the time of the car bomb there, and also remember well the calls from relatives after the Ealing Broadway bomb (the two last (?) of the IRA related attacks against targets in England).

Despite the continued threat of attacks, including the infamous attack against the Conservative party conference in Brighton in 1984 that nearly left the UK with large parts of the cabinet killed, the UK has not seen such brutal attacks on civil liberties in decades.

Maybe the UK government should focus a bit more on the things that actually kills the most people instead?

And perhaps they should also consider that the current terrorism in part is an attack on democratic ideals and in part an attack on what is presented as Western abuse of powers - by attacking civil liberties the UK government is falling into the trap of both giving these would be attackers what they want and at the same time handing them further ammunition.

Computer case that truly blows

MiniITX.com is the place for all kinds of cool projects based around MiniITX and other small factor motherboards. Today they're featuring The Cool Cube - a computer case built entirely out of case fans...

Design patterns and embedded design

RealTime Mantra at eventhelix.com is a useful resource covering varying aspects of realtime and embedded software design.

It also contains a great set of sequence diagrams for implementing various network protocols, as well as design patterns and a more formal pattern catalog. While the main focus is on embedded and realtime design, much of the material is valuable guidance for designing network services and other software as well.

March 08, 2005

Something you're ashamed of?

Post it to GroupHug.us, or alternatively just read other peoples confessions and laugh your ass off (or get grossed out)

Your Functional Programming Language Nightmares Come True

The other day covered Wouter's languages.

But some of them actually make a great deal of sense. Quite unlike pure evil that is Unlambda - a pure functional language based exclusively on combinators.

Seeing Unlambda brought back bad memories of INTERCAL.

But you can't cover perverse languages without mentioning Befunge. Befunge is particularly evil because the program text can change direction...

I'll round off with this list of esoteric programming languages most of which can't ever hope to rival Unlambda, INTERCAL and Befunge in amount of pure evil unleashed on the world.

Do YOU work with fools?

Some of the stories at iWorkWithFools.com
seem strangely familiar... Of course I'm talking about past jobs.... :)

Geek porn...

Samsung Offers 82-inch HDTV TFT-LCD display panel. Drool. Something tells me I won't like the price.

Compile time Towers of Hanoi

Blame this guy for inspiring me (but read his article - it's a great summary of the basics of template metaprogramming):

#include <iostream>

template<int From, int To, int Using, int N> struct Hanoi { typedef Hanoi<From,Using,To,N-1> prev_move; typedef Hanoi<Using,To,From,N-1> next_move;

enum { move_from = From, move_to = To };

static void exec() { prev_move::exec(); std::cout << "Move from " <&ly; move_from << " to " << move_to << std::endl; next_move::exec(); } };

template<int From, int To, int Using> struct Hanoi<From,To,Using,0> { static void exec() { } };

int main() { Hanoi<1,3,2,8>::exec(); }

Note that while the template allows you to manipulate the solution at compile time, the exec() function generates the textual output at runtime, and the program is going to grow ridiculously big for large number of discs considering the simplicity of the algorithm.

(Since I'm lazy, the overall approach mimics this C implementation)

'The internet is shit'

This site is quite remarkable.

On the surface it's a short treatise or manifesto that boils down to the fact that the internet isn't some magic solution to all the worlds problems, but just a tool, and that people need to realise this and take it for what it is, rather than put it on some sort of pedestal.

Not much new there. Or it shouldn't be new, at least...

What's makes the site intriguing is the presentation. Instead of some flashy design, it's presented as a series of 11 slides, and nothing else. No contact information, no links to more information. Nothing.

It's strangely effective, and serves to underline and draw attention to the message in a way I don't think a more normal presentation would have.

Losers... Now they're on the internet...

Apparently losers can now be found on the internet. Who would've thought?

Random Access Memory: An experiment in collective recollection

Random Access Memory is a strangely compelling site.

It allows you to browse and add (if you register, which is free) fragments of memories. Memories can be browsed and searched by year or by subject.

Try a subject like love for instance, and you'll find short fragments spanning decades - some just a line or two where you can spend ages trying to figure out what it meant for the person who wrote it; some longer entries painting vivid images.

Or try a year that meant something to you - say 1975 - and get an eclectic mix of fragments relating to everything from a particular person, relationships, and others.

It's one of the most successful conceptual art projects I've seen online, and a great experiment in that it's one of those things that are impossible to predict the outcome of once you start. Well worth a visit.

Transcript of EU Council Meeting on Software Patent Directive

Head over to Groklaw for a transcript and discussion of the EU Council meeting that adopted the software patents directive.

FFII also have a page with English and French transcripts as well as a audio recording of the meeting.

(Note: Groklaw calls it an EU Commission meeting, but it was a Council meeting - it's hard to keep track of all the various parts of the EU bureaucracy sometimes...)

March 07, 2005

Time passing you by?

If you feel time is going to fast; that things are to transient; that society is focused too much on the here and the now, then The Long Now Foundation is for you.


Free tech books online

There are lots of free books online, but while there are many large collections of out of copyright works and mainly fiction, such as Project Gutenberg, copyrighted works tend to end up fragmented into small collections.

So it's great to see collections such as the techbooksforfree.com's Free Programming and Computer Science Books collection. It's not as structured as the Gutenberg Project, as they're linking to the sites of the various copyright owners, but it's a surprisingly comprehensive collection.

European Anti-Software Patent Bribe Pledge Drive

The European Anti-Software Patent Bribe Pledge Drive would have just seemed like a bad joke if didn't seem like such a realistic option after todays blatant violation of democratic principles by the EU Council.

Haunted by Wesley Crusher

I've been reading Wil Wheaton's blog, WIL WHEATON DOT NET, for several years now, and he's one of the funniest writers out there. He's gone through a remarkable transformation as he's gotten more experience.

So now, when he faces the possibility of having his former alter ego voted most annoying Star Trek character, I naturally have to support him by voting for someone else.

While I can truly say I appreciate his writing much more than I did his role as Wesley Crusher, that's more because his writing is really good than because I had any problems with his acting.

If you don't already, go read his blog.

(Who else could I vote for but Neelix? Did that character have ANY redeeming features?)

NoSoftwarePatents.com: Council sends a mockery of a CP to the EP

For more on the EU Councils approval of the software patent directive, se this article: Council sends a mockery of a CP to the EP

One interesting tidbit from the article: 'Danish minister of economic affairs Bendt Bendtsen explained that Denmark would have preferred for the position to be renegotiated but that he was "not going to stand in the way of the formal adoption" and instead decided to attach a written declaration to the Common Position. The EU Affairs Committee of the Danish parliament had instructed him on Friday to demand a renegotiation of the controversial position.'

This means the Danish minister apparently decided to ignore parliament... Where have we heard that one before? Oh, yes, that's right, the Dutch minister did the same last year.

It seems the Council have decided that the only way to get it's work done is for it's members to blatantly ignore their respective parliaments and do whatever they feel like. The level of arrogance that represents from an organization consisting of unelected representatives of governments that in many cases does not even have majority support in their respective parliaments is staggering.

EU Council Presidency violates Council rules to force through software patents

According to this article the EU Council Presidency decided to force software patents through the Council against objections from at least Portugal, Denmark, Poland and Cyprus despite the fact that discussion of a point as a B-item according to the EU Councils Rules of Procedure can only be rejected by a majority of the Council, and not by the EU Presidency.

See also this BBC News article

This decision by the Luxembourg EU Presidency makes a complete mockery out of the idea of a democratic EU, after the EU Commission earlier decided to overrule the directly elected parliament and leave the decision to the Council. Now the Council have in effect seen it's Presidency blatantly ignore the Councils own rules in order to stop dissent and prevent an embarrassing defeat for the Commission.

Unless there's some way of fighting this Council decision, the remaining chance now is that the Parliament will revolt against this undemocratic and arrogant move and refuse to approve the draft directive on second reading. However that is an uphill battle considering that an absolute majority of ALL MEP's is needed, and a significant number of MEP's usually won't attend every sitting.

Trials and tribulations of a new office

Just getting settled in at my new office in central London. From a relatively "cosy" office with "only" 80-90 people per floor we've gone to a building where the walk to the soda machine is an epic journey...

Amazingly enough almost everything seems to just work. IT did an amazing job in getting things hooked up, and apart from a few machines being denied access to the network due to wrong configuration everything has just worked more or less out of the box.

My one gripe so far are the meeting rooms... Why, oh why are the meeting rooms on the floor that house only it/engineering staff and HR being primarily allocated to client facing meetings, while the floors for sales and marketing etc. have plenty of rooms for ad hoc meetings?

And whose idea was it to name all the meeting rooms on one of the floors after pubs? I can just imagine all the intentional "misunderstandings" when people are told to meet in Slug and Lettuce.
For that matter, on other floors we now have rooms like Reading, Milton Keynes and Bognor Regis, as well as Bondi, Ipanema, Waikiki and other beaches...

The area, though, is great. Just minutes away from Charing Cross Road for bookshops, Tottenham Court Road for electronics, as well as China Town... Mmm. Dim Sum for lunch all this week?

March 06, 2005

The metamorphosis of Prime Intellect

The Metamorphosis of Prime Intellect is an engaging science fiction novel available for free online.

It explores a concept that is fairly well explored in Science Fiction, namely the idea of a computer powerful enough to create a virtual or semi-virtual world where humanity can live without the problems of 'reality'.

Most recently we've seen it in the Matrix trilogy, but the idea is far older. Of my favorite stories, Arthur C. Clarke's City and the Stars and Dennis Danvers Circuits of heaven rank as my favorites, but Metamorphosis of Prime Intellect isn't far behind.

Imagine a society where nobody can die, and death ends up as an art form for an avantgarde group consisting of people who's oh so bored with all the "normal" stuff humans could think of to spend eternity doing. Of course, death isn't quite death when Prime Intellect will bring them right back to life again.

The main character, Caroline, was the oldest person alive when Prime Intellect came online and transformed the world, and she's bored. She was the one who turned death into art, but she's still not content. The story follows her on a quest to learn more about Prime Intellect and the basis for it's decisions, and to change the world from what it has become into something just a bit closer to what it once was.

It's a compelling read that intersperses a quickly moving and unusual story with a gradual explanation of the creation of Prime Intellect.

Metamorphosis of Prime Intellect owes a lot to Isaac Asimov, both because Prime Intellect is based on his three laws, but more importantly because the story to a large extent is structured around exploring the consequences of the three laws or a manifestation of in a computer in much the same way as Asimov used his robot stories for the same purpose.

The most original aspect of the novel is that it goes into a lot more detail about the practical aspects of implementation than Asimov and most other writers dealing with this problem, who tend to resort to handwaving and explore the laws purely in the abstract, asuming a "perfect" implementation.

The Metamorphosis of Prime Intellect in many ways exemplify how difficult software design is, where any implementation decision can have profound implications on the actual behaviour of the final system without any change to the original requirements.

The novel is fairly unique for well written S.F. in that it contains some fairly strong sexual imagery that could be upsetting to some readers.

Sex in SF is nothing new, but it tends to either be relatively tame, or be contained to trashy works with few other redeaming aspects. This book is by no means pornography, though, - you'd have to be pretty disturbed to get turned on by it, and the sexual content feels natural for the general premise of the story. Clearly the novel could have been written without it, but the graphic sex and violence create a far more disturbing image of the "perfect" world created by Prime Intellect than I think most other descriptions would.

Matrix, for instance, fails miserably there, by presenting a virtual world that seems perfectly ok for most people, and while a non-virtual dystopia such as This Perfect World by Ira Levin is effective, it is a lot less effective today than when it was written.

All in all, I warmly recommend The Metamorphosis of Prime Intellect, provided that you can stomach the sex and violence.

(Tag: )

It's all about accessibility

Accessibility is important. Not only because it is a legal
requirement in several countries, but because those laws were introduced because access to information is a real issue for the millions of people that have a disability that limit their access to information.

It's easy to assume that this is something that applies only to large corporate sites, or government services, but with blogging taking on more and more of a role as a social and political soapbox, lack of accessibility has the potential for causing significant exclusion.

This is particularly unnerving considering how easy it is to make a website accessible - generally all it takes it a little knowledge, and putting a little bit of thought into it whenever you add content or modify your templates.

So make your site accessible. I've not yet cleaned up my blog - I only installed MT a couple of days ago, but I've made a few initial steps, thanks to Dive Into Accessibility - a great introduction to accessibility techniques which also provides a detailed rationale for all the suggestions provided, and to
Bobby WorldWide, an automated accessibility testing tool.

Programming language fetishists unite!

If you, like me, is a programming language fetishist, then you'll love Wouter van Oortsmerssen. I first became aware of him when Amiga E was rapidly gaining popularity with the Amiga crowd.

Wouter is one of those rare guys that never seems to run out of interesting concepts for new programming languages, as can be seen on Wouter's programming language page. Languages include Amiga E and the seminal False, as well as more bizarre languages like G, Apfelstrudel, Kartoffel and Sauerkraut (the latter three visual languages).

His programming language page is well worth a read...

WTF?!?

If you're a software developer, by profession or as a hobby, then The Daily WTF should be compulsory reading. I swear I must have worked with some of the people they've showcased code from...


'This is a sinking boat'

It sometimes takes a while, but the article A Linux Nemesis on the Rocks over at Business Week is a good demonstration that there are limits to how long you'll be able to deceive the mainstream press.

If you've been following the SCO vs IBM lawsuit over at Groklaw this article won't tell you anything new, but it's a great summary of just how badly SCO is doing.

The last sentence says it all about how quickly SCO is going from threat to minor nuisance: 'This case may be long forgotten when, and if, it ever goes to trial'.

YQ it! Contextual search from your blog

Since I highlighted My Yahoo! Search (I work for Yahoo!, though not doing search) earlier I have to mention Y!Q as well. Y!Q is a new Yahoo! service that automatically generate a query based on words extracted from data you give it.


I've added Y!Q to my blog on an experimental basis, and here is what to add to your template if you use Movable Type.
It does a quite good job at finding relevant queries, and I'm looking forward to seeing how it works out...

Inside the <head> tag:

<script language="javascript" type="text/javascript" src="http://yq.search.yahoo.com/javascript/yq.js"></script>

And replace <$MTEntryBody$> with:

<div class="yqcontext">
<$MTEntryBody$>
<form class="yq" action="http://yq.search.yahoo.com/search" method="post">
<input type="hidden" name="context" value="<$MTEntryExcerpt$>" />
<div class="yqact">
<input class="yqbt" type="submit" value="Search Related Info"
onclick="return activateYQ(this)" />
</div>
</form>
</div>

Note that I'm using MTEntryExcerpt instead of the full body both because of size but also because MTEntryBody expands with HTML tags, which messes up the form.

Egocasting and sex

Wired News: Comfortably Numb Relations is this weeks column by Regina Lynn, Wired's sex columnist. Her column is always a great read in part because it takes themes that are relevant regardless of it's specific application to sex.

In this case she briefly explores the idea of "Egocasting", or the (ab-)use of personalisation technology and the distance that can be created through technology to filter and create a reality that excludes dissent.

She applies it to sex, while the original article she refers to considers "mainstream" issues.

Other articles by Regina Lynn includes articles on the lack of privacy in office romances thanks to technology, legal problems for sex aid users in Alabama, and where to find quality cybersex.


More places to ping

More places to ping - not much else to say really. Great list of sites to ping when your blog updates.

Personalized search

If you haven't tried it yet, give My Yahoo! Search a spin. I work for Yahoo! so I'm naturally a little biased, but I truly love this beta service.

It allows you to create your own collection of sites to search in, and gives you various ways to organize them. It also lets you add any search result instantly to your collection, and can keep track of which search results you click through to.

Combined, this works better for me than most bookmarking tools, as I tend to bookmark far too many sites for it to be practical to maintain a folder structure. Now I only bookmark a few sites, and add the rest to My Yahoo! Search.

Who do I ping?

As a new blogger Neil's World - Pinging service run-down was a great find. I had no idea which services to ping, and whether or not they'd be useful.


Linked In - social networking growing up

I first came across LinkedIn a couple of weeks ago thanks to an invite from a colleague. It's finally a sort of social networking I can actually picture myself using. Contrary to Orkut that I registered with to see what all the noise was about, and subsequently ignored, and all the other similar sites, Linked In serves a definite purpose that people are spending a LOT of time and resources doing more or less manually in the past.

Linked In focus on adding trust to requests for contact from strangers by letting common friends mediate the contact. Just as you're likely to have introduced lots of your friends to eachother for for various reasons such as shared interests, job openings etc., Linked In lets you do the same thing while automating the logistics and providing people with powerful search functionality to allow them to see which of their friends have contacts that might be useful 2,3 or 4 steps removed.

Since all requests goes through the intermediaries, it acts as a deterrent for meaningless attempt to spam people 3,4 steps removed unless you have a genuine reason to reach them, as you're likely to get your request ignored and risk alienating contacts that may be your friends, or at least valuable contacts that you won't want to lose.

So far I've found a lot of people on Linked In that might be useful for me to know, and I intend to experiment with it for a while and see if perhaps social networking is actually on its way to growing up.

My first entry

So, I've finally decided to do something with my website, and installed Movable Type...



About me

E-mail: vidar@hokstad.com Skype: vhokstad
Twitter: vhokstad
View my LinkedIn profile.

I was born April 21st, 1975, in Oslo, Norway. Since 2000 I've been living in London, UK. I'm married and we just had our first child, Tristan Ikemefuna Hokstad.

I'm working for Aardvark Media as Director of Technology. I'm also currently on the board of SpatialQ, a startup in the GIS space, and an advisor to Skoach, a startup doing a time management app for people with ADD.

Twitter Updates

    follow me on Twitter