Monday 2 June 2008

Rambot redux

FritzpollBot is a name you're likely to be hearing and seeing more of: it's a new bot designed to create an article on every single town or village in the world that currently lacks one, of which there are something like two million. The bot gained approval to operate last week, but there's currently a village pump discussion underway about it.

FritzpollBot has naturally elicited comparisons with rambot, one of the earliest bots to edit Wikipedia. Operated by Ram-Man, first under his own account and then under a dedicated account, rambot created stubs on tens of thousands of cities and towns in the United States starting in late 2002.

It's hard for people now to get a sense of what rambot did, but its effects even now can be seen. All told, rambot's work represented something close to a doubling of Wikipedia's size in a short space of time (the bulk of the work, more than 30,000 articles, being done over a week or so in October 2002). The noticeable bump that it produced in the total article count can still be seen in present graphs of Wikipedia's size. Back then the difference was huge. I didn't join the project until two years after rambot first operated, but even then around one in ten articles had been started by rambot, and one would run into them all the time.

During its peak, rambot was adding articles so fast that the growth rate per day achieved in October 2002 has never been outstripped, as can be seen from the graph below (courtesy Seattle Skier at Commons):

There was some concern about rambot's work at the time: see this discussion about rambot stubs clogging up the Special:Random system, for example. There were also many debates about the quality and content of the stubs, many of which contained very little information other than the name and location of the town.

The same arguments that were made against rambot at the time, mainly to do with the project's ability to maintain so many new articles all at once, are being made again with respect to FritzpollBot. In the long run, the concerns about rambot proved to be ill-founded, as the project didn't collapse, and most (if not all) of the articles have now been absorbed into the general corpus of articles. The value of its work was ultimately acknowledged, and now there are many bots performing similar tasks.

In addition to the literal value of rambot's contributions, there's a case to argue that the critical mass of content that rambot added kickstarted the long period of roughly exponential growth that Wikipedia enjoyed, lasting until around mid-2006. I don't think it's unreasonable to suggest that having articles on every city or town in the United States, even if many were just stubs, was a significant boon for attracting contributors. From late 2002 on, every American typing their hometown or their local area into their favourite search engine would start to turn up Wikipedia articles among the results, undoubtedly helping to attract new contributors. The stubs served as a base for redlinks, which in turn helped build the web and generate an imperative to create content. Repeating the process for the rest of the world, as FritzpollBot promises to do, would thus be an incredibly valuable step.

Furthermore, as David Gerard observes, when rambot finished its task the project had taken its first significant step towards completeness on a given topic. Rambot helped the project make its way out of infancy; now in adolescence, systemic bias is one of the major challenges it faces, and hopefully FritzpollBot can help existing efforts in this regard. Achieving global completeness across a topic area as significant as the very places that humans live would be a massive accomplishment for the project.

Let's see those Ws really cover the planet.

1 comments:

Gregory Kohs said...

I have seen this mentioned on a couple of mailing list and blogs, but nobody seems to have found the need to answer an important question -- What will be the source database of this new bot? Is everyone comfortable that it is and accurate, unbiased source of information?

Post a Comment