<aside>Second post of the day is quite a record for me but this one isn't about microformats so you can probably look away now...<aside>
This post is partly a response to and partly the result of conversations with Matthew Wood, Chris Sizemore and John O'Donovan on our recent jaunt to . Now I think that most of our department would agree with Tom. After all we've been having these conversations for a few years now and when it comes to URL design .
When you're building anything it's always good to admit that cleverer people than you or I (or even ) came before. In the case of the web gave us and HTTP is stateless. It's the whole beauty of the web: everyone, everywhere gets the same thing from the same place. The moment you pick a fight with this design you're probably gonna get beat.
Which is not to say that people haven't picked this fight. Many websites (including the ³ÉÈË¿ìÊÖ) use to preserve state across requests. So but when you make that choice you need to be aware that all your user activity will remain uncaptured by the web - no browsability, no google goodness, no benefit to your organisation (beyond the obvious) and no caching.
So, like I say I agree with the but I'd like to try to add a fifth: if possible don't reinvent other people's web identifiers. By web identifiers I mean those fragments of URLs that uniquely identify a resource within a domain. So in the case of the entry for The Fall () that'll be d5da1841-9bc8-4813-9f89-11098090148e.
The last time we updated the /music site we made this mistake (kind of unavoidable at the time). Even though we linked our data to MusicBrainz we minted new identifiers for artists. So The Fall became /music/artist/jb9x/ where jb9x was the identifier. But jb9x doesn't exist anywhere outside of /music. We'll (hopefully) .
When we first the big attraction was 2 fold:
- stable web-scale identifiers
- - no separate deals to reuse data in APIs etc
So when the next version of /music goes live you'll see: /music/artists/d5da1841-9bc8-4813-9f89-11098090148e and the world will hopefully be a slightly better place.
Now I can already hear my old mentor saying:
Michael noooo! URIs are just identifiers for resources. They shouldn't reflect the taxonomy of the site. The resource should define it's relationships to other resources not the URI. Call them anything you like but just keep them stable.
With which I also mostly agree but - if bbc.co.uk/programmes tagged content with the same vocabulary as we'd be able to cross promote news stories from programmes and programmes from news stories by sharing APIs not databases. Tie this into personalisation and the power goes logarithmic. Read six articles on reconstruction in Iraq? Then you might like this Panaroma programme.
But if the vocabulary used to tag programmes and news was web-scale then , , etc (or someone in between) could start to aggregate stories around a shared sense of topic. This is what Chris' recent post on using wikipedia / dbpedia as a controlled vocabulary begins to hint at. It's like or except the terms returned are web native or web-scale identifiers if you will.
So what's the practical benefit: well because the new /music URLs will be based on MusicBrainz identifiers and because /music will be interlinked with /programmes and because the speaks in MusicBrainz identifiers can spend a weekend at making something that takes your Last.fm user name, extracts your favourite artists, ties them to /music and recommends ³ÉÈË¿ìÊÖ programmes. Which is a .
Taking another example for those who wish to stalk Tom Scott. His blog is at which is also his OpenID, you'll find his delicious account at , his tweets at and if you want to hire him he's at on LinkedIn. So derivadow is a web-scale identifier for Tom. It's not as strong or as powerful as a set of RDF linked URIs but if you wanna aggregate Tom-ness it's a pretty good starting point. Sadly I can't find him anywhere on Last.fm but that's possibly a godsend.
The obvious question is if web-scale identifiers are so good why did the ³ÉÈË¿ìÊÖ mint it's own for programmes? After all the the b00c4wxm used in /programmes and iPlayer is a ³ÉÈË¿ìÊÖ invention. And the answer is there were no suitable identifiers out there. I'd like to think that if Program(me)Brainz existed with stable identifiers we'd have put in the work to use those instead. But it didn't so we couldn't... But now we have stable identifiers out there on the web free to use for anyone. It would be good for example to see these identifiers adopted by . Time will tell.
One argument against all this is that web-scale identifiers are often kinda ugly. After all if Last.fm gets away with why do we need d5da1841-9bc8-4813-9f89-11098090148e. The answer is ambiguity. MusicBrainz has . Which one(s) does the ³ÉÈË¿ìÊÖ play? Probably none actually but you get the point. If we want to be exact in what we point to we need to handle ambiguity. In general we follow 3 commandments:
- URLs should be human readable
- URLs should be hackable
- URLs should persistently point to one concept
And the greatest of these is persistence. If you can't maintain stable URLs per concept don't even bother with 1 and 2. There are others that argue that . If resolving ambiguity is not important to your business then I'd agree but if you need to differentiate stuff with the same label you need unique identifiers - better yet web-scale identifiers.
Now I guess the people would say do this properly in with etc and we will do. But for hackers without PhDs the possibility of instant interoperability and quick mesh-ups is irresistible. Obviously you'll still need to establish equivalency between and but luckily that's where the people have done some of our work for us. And they're damn nice people to boot.
So I guess what I'm saying echoes Tom. Cleverer people than us have come up with ways to attach web-scale identifiers to content so why waste time reinventing. Whilst the ³ÉÈË¿ìÊÖ or *insert your organisation here* should own their data (whilst hopefully making it free - as in beer; as in speech) we don't have to own our identifiers. If we choose to use the power of web-scale identifiers we free our content to fly and . It's not exactly profound but it does feel like a small breakthrough to an .