³ÉÈË¿ìÊÖ

« Previous | Main | Next »

Data Dumps could be full of Gold

Post categories: ,Ìý

Ian Forrester Ian Forrester | 15:00 UK time, Thursday, 28 January 2010

are a concept we thought about doing over the last few years through Backstage. We would take everything and anything which was licensed in a way we could use it under the backstage licence, zip it up and just dump it on a web server for you all to unzip and explore.

However three problems crept up, one finding data which we could clearly put out as a dump, two removing all reference to personal data or/and people (anonymised) and thirdly putting it somewhere sensible.

For example, we had tried to get a selection of the web traffic logs out, but at 2+gig per month I believe it was. It would have been a small nightmare even hosting or moving them anywhere like archive.org. And thats after having to remove all the secret and private information. Slearned about last year when it gave away a dump of data for research. Obviously we would never risk our/your data in this way.

About this time last year, it was decided to try experimenting with raw data stacks via a XML Database (existdb) using data which was already public. You can find them under the . The Tweetstore is a good example of what were trying to achieve with Data dumps. Generally it archives all tweets which the official ³ÉÈË¿ìÊÖ twitter user create. By there-selves, its not that interesting but the value is in what patterns you can pull out over time. With good analysis it would be possible to find keywords which attract followers for example.


We're interested in peoples view on data dumps, are they useful or its not worth looking at unless its a nice clean API? Also what do people think of a hybrid model like we have done with the XML Database? Is it still too abstract for use?

Comments

More from this blog...

Topical posts on this blog

Categories

These are some of the popular topics this blog covers.

³ÉÈË¿ìÊÖ iD

³ÉÈË¿ìÊÖ navigation

³ÉÈË¿ìÊÖ Â© 2014 The ³ÉÈË¿ìÊÖ is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.