Sunday, 13 May 2007

A threshold question

David Gerard has set a challenge to the Wikipedia community, to "construct a useful notion of 'notability' using only neutrality, verifiability and no original research". Here's my attempt.

David identifies the extraordinary subjectivity of the "notability" concept as one of its key failings, and I tend to agree. That would seem to be a fairly widely accepted position, since the most broadly accepted variations on notability have been ones which introduce some objective standard; for example the requirement for a certain number of sources addressing a subject. These are obviously not perfect or necessarily desirable standards but they're a good starting point. I think a useful notion of notability will be something that flows naturally from the core content policy and which is also capable of being at least partially objective in application.

During the Attribution debate I started work on an essay on what I think ought to be done about our content policies. It is still unfinished, but presently it is roughly three parts the essay I intended and two parts Theory of Everything.

In the essay, I attempt to show how all of our content policy may more or less be derived from the neutral point of view policy and the goal of accuracy and reliability, whether it be a principle such as not accepting original research, or a mechanism such as only utilising reliable sources. I think a similar approach can be taken with respect to notability.

A basic definition of "notability" is that a subject, to warrant coverage in Wikipedia, must have attracted some degree of attention from the outside. The difficulty has always been in identifying that degree. In my view, the threshold ought to be where achieving a neutral point of view becomes reasonably practicable, while using only material attributable to reliable sources and without resorting to original thought.

In practice this comes down to the number and quality of sources available, though it's not a matter of counting them. The question to be asked is, essentially, can NPOV be achieved with the sources available? Are there sufficient sources available, and are they of sufficient quality, that it is reasonably practicable to fairly portray all significant points of view on a subject, in accordance with their prevalence? If not, then the subject is not sufficiently notable.

So that's the principle; I'll leave the practical implementation of this for later. As a final note, consider this: two of the most problematic types of articles out there are the hack job and the puff piece (particularly the former). People doing OTRS or BLP work will have come across these types of articles time and time again. Consider the way these types of articles could be approached differently with a concept of notability based on the feasibility of achieving NPOV.

Monday, 7 May 2007

Time for a rethink on conflict of interest

I've been thinking lately that we need to reshape our approach to the conflict of interest guideline, which is about people who have a conflict between their own interests and the interests of Wikipedia, whether that be a financial interest (eg, people paid to edit), a personal interest (eg, people writing about themselves) or some other interest.

I'll identify some current issues with the guideline, and some ways in which I think it can be improved. But first, a little bit about the history of the guideline.

The page began in May 2004 under a different title, "vanity guidelines". It focused on articles created by people very close to the subject of the article - usually autobiographies or articles about relatives or companies related to the person. Interestingly, the oldest version of the page contained the advice that "vanity" articles should not be deleted simply because they may have been written by someone very close to the subject, but because of the problems that they almost invariably have: a lack of neutral point of view, the inclusion of much non-encyclopaedic content and so forth.

The page has always been related to the autobiography guideline (which has a heritage back to July 2003) though they never merged. The guidelines were gradually developed over time (becoming increasingly wordy, naturally); though they always had a focus on the creation of new articles, and how editors ought to approach those, it eventually moved to cover topics such as citing yourself - mainly addressing the issue of academics citing scholarly material authored by themselves.

In October 2006 the vanity guidelines were renamed as the conflict of interest guideline - mainly to avoid the disparaging term "vanity" but also to emphasise that the guideline wasn't just addressing people writing about themselves, but people writing about any subject that they were too close to. The need for a rewrite was declared, and by the end of the month it had more or less taken on its present form, although there have been many more changes since then.

The development of the guideline has inevitably been bound up with particular conflicts, especially relating to MyWikiBiz, aswell as several prominent arbitration cases. It has evolved from focusing on writing about one's self to writing in a much more varied range of situations. In doing so it has also changed from being essentially a page offering advice to a page offering imperatives, even though that is not always its intention. I'm particularly concerned at recent trends which seem to be moving in the direction of penalising users simply for being in a position of conflict of interest, as opposed to producing poor content as a result of their conflict.

The problems that we ought to be addressing are not that someone is editing with a conflict of interest, but that someone is producing content which is not neutral, or is unverified, or is original research. The problem is with the content that is actually produced. Granted, a person with a conflict of interest is probably more likely to produce problematic content than a person without a conflict, but that doesn't mean that the person always will (nor that the person without never will).

To put it another way, the existence of a conflict of interest may explain why someone is producing problematic content, but it is not the problem in its own right. The problem is the actual content someone produces.

I realise that this is easy to say, but hard to reconcile with practical reality. At this time I think the community needs to resolve the tension arising from the question of whether we desire people with conflicts of interest to continue to edit, or whether we would prefer that they not edit at all. Either approach has its downsides; as Charles Matthews identifies, an approach of discouragement raises issues of a conflict with the necessity of assuming good faith, but a more relaxed approach risks achieving nothing more than restating the content policies.

My preferred approach would be to discourage people with conflicts from editing as much as possible, but at the same time, give them as great an opportunity to participate through other means as is practicable. If editing material about which one has a conflict of interest is taken away as a socially acceptable option, it needs to be substituted for something else.

In a recent completely new draft of the guideline, I identified the use of talk pages, requests for comment and the OTRS system as such avenues. Hopefully the community will be able to identify a much better range of methods than this. Ideally they should involve mediating content through the community and should have a low barrier to entry.

As a final point for now, I think we also need to be more welcoming and less suspicious of people with conflicts of interest, on the proviso that at the same time we strongly discourage them from editing. We need to create an environment where disclosing a conflict and working with other editors to produce valuable content is a truly viable option for people with conflicts. A heavy-handed approach will force people underground in attempts to sneak content in, and that can only be counter-productive.

I have probably raised more questions than I have answered here. Hopefully this will get people interested in developing a more robust and well-rounded approach to conflict of interest.

Thursday, 3 May 2007

Wikipedia is not Thermopylae

The HD DVD encryption key controversy rages on, and while Digg goes out on a slender limb, other user-generated content communities, including the Wikimedia family, are still deciding what to do.

The good news is that the community seems, for the most part, to be taking a sensible course of action and rejecting attempts to put the contents of the key into Wikipedia. There are a few dissenters though, most of whom are beating the "censorship" drum and complaining about oppression of the masses and their rights to free speech.

There's a section in what Wikipedia is not entitled "Wikipedia is not censored", which is often clung to by these people who rail against "the Man". If memory serves me correctly, the section used to be titled differently; "Wikipedia is not censored for the protection of minors" at one stage, and "Wikipedia is not censored for good taste" at another. The latter of these is the better, in my opinion, because the point of the statement is to explain that we can neither guarantee that all content will comply with some standard of good taste, nor will we exclude content that some people find objectionable (encyclopaedic material about sex, for example).

The problem is that some people try to boil that down to a slogan, "Wikipedia is not censored", and get themselves confused. While it's correct to say that we generally don't exclude content for reasons of taste based on social or religious norms, we undoubtedly do exclude content based on our policies and based the laws of Florida and of the United States, where the projects are based.

It also needs to be remembered that while Wikipedia is not censored (for good taste), it is not a whole bunch of other things too. It's not a soapbox, for starters, nor is it an experiment in democracy, or anarchy. It's especially not a tool for experiments in civil disobedience. It's an encyclopaedia.

United States District Judge Lewis A. Kaplan put it well in the case of Universal v Reimerdes:

"Plaintiffs have invested huge sums over the years in producing motion pictures in reliance upon a legal framework that, through the law of copyright, has ensured that they will have the exclusive right to copy and distribute those motion pictures for economic gain. They contend that the advent of new technology should not alter this long established structure. Defendants, on the other hand, are adherents of a movement that believes that information should be available without charge to anyone clever enough to break into the computer systems or data storage media in which it is located. Less radically, they have raised a legitimate concern about the possible impact on traditional fair use of access control measures in the digital era. Each side is entitled to its views. In our society, however, clashes of competing interests like this are resolved by Congress. For now, at least, Congress has resolved this clash in the DMCA and in plaintiffs' favor."

So to all the geeks itching to fight the Man, go write to your Congressman (if you live in the US and have a Congressman of course - if you don't, then I suppose you'd better nag your American friends to do so). And if you want to engage in civil disobedience, don't abuse Wikipedia in order to do so. It's not your farm to bet.

Tuesday, 1 May 2007

Nonpublic data resolution

I'll open with the standard disclaimer that IANAL (JALS).

All Wikimedia volunteers for tasks which involve access to nonpublic data (stewards, checkusers, oversighters, OTRS volunteers, developers), a group that happens to include me and many other Wikimedians, now need to identify themselves to the Foundation, and be of the age of majority in whatever jurisdiction they live in, following this resolution from the Board of Trustees. In many ways the resolution is a companion to the privacy policy.

The idea was suggested by Anthere about a month ago and developed over the days following (although it has been mooted before, and was raised most recently in the context of the validating credentials debate). It seems that the resolution was passed in the middle of April, but the process of collecting information from people is just beginning.

I think it's definitely a positive development. However, it's interesting to take a closer look at the reasons behind it; there are two threads to the rationale which are seemingly running in opposition.

The first part of the rationale is an ethical consideration (one might put it more crudely and say that it's a PR consideration), and revolves around cultivating and strengthening both the internal culture and the external perception of the Foundation as a responsible organisation. In this respect it's essentially about allaying common fears, because the volunteers who currently fulfil these tasks are for the most part eminently trustworthy people. It's about underlining for the public's benefit that we take these things seriously.

The second part of the rationale is based on legal considerations; essentially, as Kat says, "we wish to be able to say who is responsible for handling this information to ensure that volunteers can be held accountable for their own actions."

Delphine put it more bluntly:

"...people who have [access to this information] are trusted with information that could mean they end up in court one day, whether to testify for or defend the Foundation. As such, they should have the capacity to act without the consent of their parents."

Perhaps a better expression would have been to act without requiring the consent of their parents :) Nevertheless, the idea that Delphine was getting at, that these volunteers need to be of the age of majority so that they can be legally accountable for their actions (to the fullest extent possible) is the crux of the matter.

sj expressed this well:
" 'This is a very important role' is not a reason to discriminate based on age. 'This is a role that requires being responsible' is likewise not appropriate. 'This is a role that requires being legally accountable for one's actions' is..."

...even though (at the time, at least) he was unconvinced of the necessity of the change.

Essentially, the legal rationale is that these various types of volunteers should be fully legally competent in their jurisdiction, so they can be as legally responsible as possible - thus insulating the Foundation against them to a certain extent.

The reason I say that the two rationales for this policy change are seemingly in opposition is that the first, the ethical consideration, is about bolstering (the image of) the Foundation as a responsible organisation that deals with this information in a professional way. The legal consderation, however, revolves around isolating the volunteers from the Foundation. It's essentially about making sure that they're legally competent so that the Foundation's liability is limited as much as possible.

I should emphasise that I agree with the resolution, I just think it's interesting to observe this cross-current in the rationale.