Saturday, 30 January 2010

What happens to unreferenced BLPs?

Those of us who live other than under rocks will no doubt be aware of the latest controversy over Wikipedia's approach to biographies of living persons articles (BLPs), concerning the deletion last week of a large number of BLPs that had been tagged as being unsourced, and had not been edited for more than six months. The deletions sparked a giant administrators' noticeboard discussion, a request for arbitration and now a request for comments on how to proceed from here.

At the crux of the dispute is how seriously the project is to take the modified standards that it has adopted with respect to biographies of living persons.

Debates of this sort are usually run along inclusionist/deletionist lines, but really the more important philosophical dichotomy when it comes to BLPs is between eventualists and immediatists. Wikipedia on the whole favours an eventualist perspective - facilitated by the almost immeasurably large potential pool of labour out there - but the BLP policy is essentially a localised switch to immediatism: unsourced material needs to be sourced post-haste, or else removed.

Conceptually it's an elegant and attractive approach. But a major flaw with it is our attraction to eventualism. We just can't shake it off.

This category, and its many subcategories, tracks BLP articles that have been tagged as not having any sources. At the time of writing there are over 47,000 of them, some having been tagged as long ago as December 2006. Evidently any sense of urgency has passed those by. The backlogs mount until they approach the point where individual editors have difficulty comprehending the problem, let along working to address it. Frustration builds at the inevitable inertia, until something radical happens, like these mass deletions.

Is this view accurate? Is the problem of unsourced BLPs really out of hand? We can try to answer these questions by looking at the way the backlog has been managed.

Unfortunately, the data available for this purpose is somewhat limited. Database dumps older than the 20 September 2009 dump are currently not available due to maintenance. However that September dump, along with dumps from 28 November 2009 and 16 January this year (shortly before the deletions started), do offer three data points with which to commence.

The monthly subcategories from October 2006 to August 2009 inclusive were common to all three dumps. The total number of articles in these categories declined from 50,715 in September to 43,655 earlier this month, a 13.9% fall. However, over the same period, the total in all subcategories through December 2009 rose from 50,715 to 51,301, a 1.2% increase. At least over this period, new additions outweighed articles being removed from these categories.

It should be noted that some of these additions are due to articles that had been tagged, but were unsorted, being added into the monthly subcategories. In fact, ten of the thirty-five subcategories common to all three dumps saw increases in numbers since September. The following graph shows the change in the monthly category totals over the roughly four months between the September and January dumps:

Without analysing the actual changes in the lists of articles in these subcategories it won't be possible to tell whether the sorting process is merely outweighing the normal reductions through articles being referenced or deleted, or, as I suspect, if there are genuinely fewer reductions in these subcategories that are no longer recent, but not yet the oldest. This can be the subject of further inquiry.

What we can say now is that the total number of unreferenced BLPs is now showing real decline for at least the first time in four months, possibly longer. It seems to have been the shock of mass deletions that has spurred people into action either to fix or delete these articles. Hopefully the shock will last long enough for a significant reduction to be achieved.


Andrew Gray said...

There's a third dynamic at play here as well as fix-delete - update. An awful lot of articles were tagged as unreferenced BLPs either a) in error (there were extlinks, etc, but not explicitly labelled as references), or b) a long time ago.

In the second case, people quite often improve an article but forget to remove warning tags, often because it's done incrementally and each person doesn't think their contribution is significant enough to render it "fixed".

Either way, we end up with an article which is a referenced BLP, but is tagged as unreffed; removing it from the category doesn't really indicate fixing, just cleanup of the listing.

I suspect at least 10% of our originally quoted "50,000!" are/were like this, and their being removed now is indicative of us catching the low-hanging fruit.

This status-lag is an interesting problem - it's also very obvious with stub-tagging, where very sizable articles retain stub tags for months or years - but I'm not best sure how we could deal with it!

