January 15th, 2006, By Duncan Gough
Which is something that the alpha version of Millions of Games had in place about this time last year. Since we were opening up what was originally going to be a basic links site to create the first and (still) only folksonomy of casual games, I was thinking a lot about using the almost perfectly-formed html scraper Beautiful Soup to extract as much metadata as I could from each game submission, turning that data into tags.
The most obvious example of this is the difference between a Flash game and a Shockwave one. Flash games end in .swf, Shockwave games in .dcr. It would be ‘trivial’ for a screen-scraper to chew through the html, find the relevant ‘object’ tag and pull out the file extension for the game. Once I had that, I could tag the game up as ’shockwave’ or ‘flash’ accordingly. From there, I could even do the same for Java games (.jar) too.
In practice, though, it became too confusing for users to see these extra tags that came out of nowhere. I’ve noticed that the founder of del.icio.us, Joshua Schachter, has often responded to questions along the line of ‘why don’t you do x?’ by saying that it’s extremely hard to display that extra information without it looking messy. In theory, I could screen-scrape submitted games for metadata and auto-generate a list of new attributes for each item in our database, but displaying that in user-friendly way becomes very hard.
Ultimately, I ruled out auto-generating tags because they reduce the value of the neighbour tags and, more importantly, they negate the act of personal expression that makes tagging such a valuable task for the end user.