Thoughts For Deletion: policy

Showing posts with label policy. Show all posts

Monday, 29 June 2009

All Quiet on the Waziri Front

There's an interesting piece in the New York Times today on investigative journalist David Rohde - who was kidnapped in Afghanistan last year and who escaped last week from his captors in Waziristan, in northern Pakistan - and the efforts to extend the media blackout on news of the kidnapping to his Wikipedia article.

The blackout was orchestrated by the New York Times Company and was said to have involved forty international news agencies, from NPR to al-Jazeera. NYT personnel "believed that publicity would raise Mr. Rohde's value to his captors as a bargaining chip and reduce his chance of survival", the story says, quoting Rohde's colleague Michael Moss as saying "I knew from my jihad reporting that the captors would be very quick to get online and assess who he was and what he’d done, what his value to them might be".

Along with staff at other news agencies, NYT personnel contacted Jimmy Wales too, who passed the matter along to a small group of administrators who reverted mentions of the kidnapping and protected the article a number of times over the following months. Michael Moss also apparently edited the article to emphasise Rohde's Pulitzer Prize-winning work on the Srebrenica massacre, as well as his work on Guantanamo Bay, believing that if his captors read the article they might view him as more sympathetic towards Muslims.

Jimbo acknowledges in the NYT piece that the matter was made easier by the lack of reliable sources reporting the kidnapping - a consequence of the blackout - which meant that the biographies of living persons policy could operate to keep any references to the kidnapping out of the article. The policy, of course, was originally intended to keep fabricated material out of articles, but it worked equally well to assist the blackout in this case.

The ethics of the blackout have come into question following Rohde's escape. NPR reported Poynter Institute journalism ethics lecturer Kelly McBride as saying "I find it a little disturbing, because it makes me wonder what else 40 international news organizations have agreed not to tell the public". Dan Murphy at the Christian Science Monitor says that the question of whether the press has a double standard in keeping quiet about their own while regularly reporting on other kidnappings will likely become part of the debate. Greg Mitchell, the editor of industry journal Editor & Publisher, details that organisation's internal debates and ultimate decision to adhere to the blackout. Mitchell raises a potential competing public interest argument, that information about events such as kidnappings in a certain area could, in some cases, help protect the public (though the average NYT reader doesn't hang out near Kabul that often - it might help protect other journalists though).

On the Wikipedia front, this is an interesting biographies of living persons case because every aspect of it involves journalists, who as a profession develop, apply and teach a whole suite of ethical principles governing their work, principles that many have suggested Wikipedia ought to adapt or learn from.

It's regularly true that hard cases make bad policy, and it is so here: the kidnapping was said to have been reported by an unnamed Afghani news agency, and apparently by Italian agency Adnkronos too; the existence of reliable sources on the matter (which I cannot verify due to absent or broken links) throws into doubt the legitimacy of enforcing the blackout on Wikipedia.

This may well put a wedge between two similar but distinct camps of support for the biographies of living persons policy: those who believe that such articles should be written from a "do no harm" perspective, and those who have a similar sympathy but only go so far as supporting a strict, immediatist adherence to ordinary content policy (instead of the typical eventualist stance), and no further.

Thursday, 7 June 2007

Why there cannot be a generic template for fair use claims

Fair use is a legal doctrine that may be used as a defence against a claim of copyright infringement. Technically speaking, until you've actually been to court and successfully invoked your claim of fair use to defend against such a suit, you're using the work illegally. In practice it's often possible to reasonably anticipate where a claim of fair use will be successful, typically by analogy with cases in which the defence has been successfully raised, and as such, the use is commonly regarded as "kosher", as it were, while still technically being illegal.

This reality raises a couple of issues. Since fair use is a defence, it's necessary to be able to explain on what basis your use falls within that defence. Since the defence applies only to particular uses of a work, you need to be able to make such an explanation for all of your uses of the work. And since claims fall into the "kosher" category by being based on solid analogies with existing cases in which the defence has been successfully raised, you need to explain the analogy you have employed, by reference to the specific fair use factors that apply to the particular work and the particular use in question.

There is no boilerplate fair use claim to be used against copyright infringement, just as there is no boilerplate claim for, say, self-defense in a murder trial, or for an estoppel claim in a breach of contract suit. Fair use claims may be very similar to each other, but that only reflects that the particular analogy being employed is strong (or at least popularly thought to be strong).

Executive summary: since fair use is a legal defence, you need to explain how it applies in every case, and this means there can be no boilerplate claims.

Friday, 1 June 2007

The fullness of time

The current debate about the application of the biographies of living persons seems focused solely on the question of whether or not Wikipedia should have an article about a person at all, and there have been only a few rare attempts to frame the debate in more nuanced terms than this binary approach. One of the key questions we should be asking, in addition to the question of whether to present information at all, is the question of how that information ought to be presented.

Much as I take a mergist stance in the inclusion/deletion debate more broadly, I think a similar stance is most preferable in this current manifestation of that debate. The fundamental reason is the same: content must be presented in the most appropriate context. The right context allows the proper significance of information to be conveyed, and presents it in a naturally coherent fashion, which is exceptionally valuable for an encyclopaedia.

In the case of biographies of living persons, the question is then whether or not certain information is best presented in a biographical article, whether such an article provides the most appropriate context for the content.

One key issue to consider is the temporal focus of an article.

Articles about an event concentrate on the short-term, are generally tied to one or several discrete points in time and have a narrower scope; even though they sometimes discuss larger issues, they do so through the prism of individual situations. In contrast, biographical articles are focused on the long-term, with scope extending to the entire lifetime of a person.

When we decide where to include material that relates to a person's involvement in an event, we ought to consider the proper temporal focus of the sources for that material. Sources with a short-term focus, that discuss a person in order to discuss a particular event, should be used to develop articles about the event, and not biographical articles about the person. On the other hand sources with a long-term focus, that discuss a person, often by way of discussing a series of events, should be used to develop biographical articles.

Many of the biographical articles that have been causing problems lately are drawn largely from news sources. News coverage, generally speaking, is almost always concentrating on the here and now; if it writes about people, it is usually writing about them only insofar as they are part of a particular event, that is, only insofar as their lives intersect with this discrete point or points of time. This is the same even for "human interest" type pieces that seem to be about people: really they have the same short-term focus, the journos are just looking for a different angle to help sell the story.

Choosing to put content in a biographical article becomes increasingly appropriate the more that the content is drawn from sources with a long-term focus. Where the only sources available are short-term, event-focused sources like news coverage, then it must be questioned whether the content should be presented as a biographical article, and in most cases (especially where the news sources are all about one event) it probably should not.

Lastly, in all of this we must not forget Wikinews, a project which is intended precisely for the type of coverage which is not always proper for inclusion in an encyclopaedia: news coverage, with a narrow, short-term focus on its subjects. Wikinews is surely a far more appropriate venue for many of these types of articles, since fundamentally it concentrates on knowledge that is important at a particular point in time.

The exhortation that "we have a really serious responsibility to get things right" in the context of biographies of living persons applies not only to what content we present, but also the manner in which we present that content. We must ask ourselves, what is the most appropriate context for this information? Is it really the most desirable choice to present this information in a biographical article? Whenever the answer is no, then look to other articles instead, where context may be better established, or else look further afield, to projects like Wikinews.

Wikipedia is an encyclopaedia, not a newspaper. Its content must be developed with this difference in temporal focus in mind.

Sunday, 13 May 2007

A threshold question

David Gerard has set a challenge to the Wikipedia community, to "construct a useful notion of 'notability' using only neutrality, verifiability and no original research". Here's my attempt.

David identifies the extraordinary subjectivity of the "notability" concept as one of its key failings, and I tend to agree. That would seem to be a fairly widely accepted position, since the most broadly accepted variations on notability have been ones which introduce some objective standard; for example the requirement for a certain number of sources addressing a subject. These are obviously not perfect or necessarily desirable standards but they're a good starting point. I think a useful notion of notability will be something that flows naturally from the core content policy and which is also capable of being at least partially objective in application.

During the Attribution debate I started work on an essay on what I think ought to be done about our content policies. It is still unfinished, but presently it is roughly three parts the essay I intended and two parts Theory of Everything.

In the essay, I attempt to show how all of our content policy may more or less be derived from the neutral point of view policy and the goal of accuracy and reliability, whether it be a principle such as not accepting original research, or a mechanism such as only utilising reliable sources. I think a similar approach can be taken with respect to notability.

A basic definition of "notability" is that a subject, to warrant coverage in Wikipedia, must have attracted some degree of attention from the outside. The difficulty has always been in identifying that degree. In my view, the threshold ought to be where achieving a neutral point of view becomes reasonably practicable, while using only material attributable to reliable sources and without resorting to original thought.

In practice this comes down to the number and quality of sources available, though it's not a matter of counting them. The question to be asked is, essentially, can NPOV be achieved with the sources available? Are there sufficient sources available, and are they of sufficient quality, that it is reasonably practicable to fairly portray all significant points of view on a subject, in accordance with their prevalence? If not, then the subject is not sufficiently notable.

So that's the principle; I'll leave the practical implementation of this for later. As a final note, consider this: two of the most problematic types of articles out there are the hack job and the puff piece (particularly the former). People doing OTRS or BLP work will have come across these types of articles time and time again. Consider the way these types of articles could be approached differently with a concept of notability based on the feasibility of achieving NPOV.

Monday, 7 May 2007

Time for a rethink on conflict of interest

I've been thinking lately that we need to reshape our approach to the conflict of interest guideline, which is about people who have a conflict between their own interests and the interests of Wikipedia, whether that be a financial interest (eg, people paid to edit), a personal interest (eg, people writing about themselves) or some other interest.

I'll identify some current issues with the guideline, and some ways in which I think it can be improved. But first, a little bit about the history of the guideline.

The page began in May 2004 under a different title, "vanity guidelines". It focused on articles created by people very close to the subject of the article - usually autobiographies or articles about relatives or companies related to the person. Interestingly, the oldest version of the page contained the advice that "vanity" articles should not be deleted simply because they may have been written by someone very close to the subject, but because of the problems that they almost invariably have: a lack of neutral point of view, the inclusion of much non-encyclopaedic content and so forth.

The page has always been related to the autobiography guideline (which has a heritage back to July 2003) though they never merged. The guidelines were gradually developed over time (becoming increasingly wordy, naturally); though they always had a focus on the creation of new articles, and how editors ought to approach those, it eventually moved to cover topics such as citing yourself - mainly addressing the issue of academics citing scholarly material authored by themselves.

In October 2006 the vanity guidelines were renamed as the conflict of interest guideline - mainly to avoid the disparaging term "vanity" but also to emphasise that the guideline wasn't just addressing people writing about themselves, but people writing about any subject that they were too close to. The need for a rewrite was declared, and by the end of the month it had more or less taken on its present form, although there have been many more changes since then.

The development of the guideline has inevitably been bound up with particular conflicts, especially relating to MyWikiBiz, aswell as several prominent arbitration cases. It has evolved from focusing on writing about one's self to writing in a much more varied range of situations. In doing so it has also changed from being essentially a page offering advice to a page offering imperatives, even though that is not always its intention. I'm particularly concerned at recent trends which seem to be moving in the direction of penalising users simply for being in a position of conflict of interest, as opposed to producing poor content as a result of their conflict.

The problems that we ought to be addressing are not that someone is editing with a conflict of interest, but that someone is producing content which is not neutral, or is unverified, or is original research. The problem is with the content that is actually produced. Granted, a person with a conflict of interest is probably more likely to produce problematic content than a person without a conflict, but that doesn't mean that the person always will (nor that the person without never will).

To put it another way, the existence of a conflict of interest may explain why someone is producing problematic content, but it is not the problem in its own right. The problem is the actual content someone produces.

I realise that this is easy to say, but hard to reconcile with practical reality. At this time I think the community needs to resolve the tension arising from the question of whether we desire people with conflicts of interest to continue to edit, or whether we would prefer that they not edit at all. Either approach has its downsides; as Charles Matthews identifies, an approach of discouragement raises issues of a conflict with the necessity of assuming good faith, but a more relaxed approach risks achieving nothing more than restating the content policies.

My preferred approach would be to discourage people with conflicts from editing as much as possible, but at the same time, give them as great an opportunity to participate through other means as is practicable. If editing material about which one has a conflict of interest is taken away as a socially acceptable option, it needs to be substituted for something else.

In a recent completely new draft of the guideline, I identified the use of talk pages, requests for comment and the OTRS system as such avenues. Hopefully the community will be able to identify a much better range of methods than this. Ideally they should involve mediating content through the community and should have a low barrier to entry.

As a final point for now, I think we also need to be more welcoming and less suspicious of people with conflicts of interest, on the proviso that at the same time we strongly discourage them from editing. We need to create an environment where disclosing a conflict and working with other editors to produce valuable content is a truly viable option for people with conflicts. A heavy-handed approach will force people underground in attempts to sneak content in, and that can only be counter-productive.

I have probably raised more questions than I have answered here. Hopefully this will get people interested in developing a more robust and well-rounded approach to conflict of interest.

Thursday, 3 May 2007

Wikipedia is not Thermopylae

The HD DVD encryption key controversy rages on, and while Digg goes out on a slender limb, other user-generated content communities, including the Wikimedia family, are still deciding what to do.

The good news is that the community seems, for the most part, to be taking a sensible course of action and rejecting attempts to put the contents of the key into Wikipedia. There are a few dissenters though, most of whom are beating the "censorship" drum and complaining about oppression of the masses and their rights to free speech.

There's a section in what Wikipedia is not entitled "Wikipedia is not censored", which is often clung to by these people who rail against "the Man". If memory serves me correctly, the section used to be titled differently; "Wikipedia is not censored for the protection of minors" at one stage, and "Wikipedia is not censored for good taste" at another. The latter of these is the better, in my opinion, because the point of the statement is to explain that we can neither guarantee that all content will comply with some standard of good taste, nor will we exclude content that some people find objectionable (encyclopaedic material about sex, for example).

The problem is that some people try to boil that down to a slogan, "Wikipedia is not censored", and get themselves confused. While it's correct to say that we generally don't exclude content for reasons of taste based on social or religious norms, we undoubtedly do exclude content based on our policies and based the laws of Florida and of the United States, where the projects are based.

It also needs to be remembered that while Wikipedia is not censored (for good taste), it is not a whole bunch of other things too. It's not a soapbox, for starters, nor is it an experiment in democracy, or anarchy. It's especially not a tool for experiments in civil disobedience. It's an encyclopaedia.

United States District Judge Lewis A. Kaplan put it well in the case of Universal v Reimerdes:

"Plaintiffs have invested huge sums over the years in producing motion pictures in reliance upon a legal framework that, through the law of copyright, has ensured that they will have the exclusive right to copy and distribute those motion pictures for economic gain. They contend that the advent of new technology should not alter this long established structure. Defendants, on the other hand, are adherents of a movement that believes that information should be available without charge to anyone clever enough to break into the computer systems or data storage media in which it is located. Less radically, they have raised a legitimate concern about the possible impact on traditional fair use of access control measures in the digital era. Each side is entitled to its views. In our society, however, clashes of competing interests like this are resolved by Congress. For now, at least, Congress has resolved this clash in the DMCA and in plaintiffs' favor."

So to all the geeks itching to fight the Man, go write to your Congressman (if you live in the US and have a Congressman of course - if you don't, then I suppose you'd better nag your American friends to do so). And if you want to engage in civil disobedience, don't abuse Wikipedia in order to do so. It's not your farm to bet.

Monday, 2 April 2007

Interesting exercise

With the debate about the Attribution policy merger still going strong, I got to reading the position papers prepared by some of the prominent proponents on each side (broad agreement and broad disagreement) of the debate. While considering those, and some of the responses in the ongoing poll, it struck me that there is a remarkable degree of difference in understanding of Wikipedia's fundamental content policies.

I was particularly intrigued by some of the comments on both sides of the debate which have discussed the ways in which Wikipedia policies have evolved over time; the people supporting Attribution arguing that policies have always been changed, and some of the people opposing it arguing that policies have changed away from their original meaning. This got me thinking about the degree to which change has actually occurred with these long standing policies.

I've always thought that the core policies in particular were essentially well understood concepts that haven't really changed much, and that the development of policy pages over time has merely been a refinement of the expression of the central idea, and an adaptation to meet changing circumstances. I decided to test whether this was really the case by, quite simply, looking at old versions of policy pages.

Here's how the core content policies looked on my first day of editing (8 October 2004):

For both verifiability and no original research, the version that existed when I first edited was within the first fifty revisions of the page. Indeed, verifiability had only been edited by a dozen different users by the time of this version. NPOV had something of a longer history, having been around since the beginning of 2002 (and longer than that as an idea).

There are a few interesting nuggets in these old versions. Most surprisingly to me is that in the old version of no original research, the page posits Wikipedia as either a secondary source or as a tertiary source, whereas I've always considered Wikipedia to be only a tertiary source, as a necessary consequence of having the NPOV policy. You'll see that the old version excludes original ideas, but permits analysis, evaluation, interpretation and synthesis as legitimate techniques in writing articles. This would, I am sure, come as a surprise to many (my homework for today: find when the prohibition on synthesis was introduced).

It's also interesting to observe that contrary to what some people assert, the verifiability policy even back then was all about checking that sources have been used accurately and correctly (ie, not misrepresenting the sources), and not about only including content that can be proved to be true. The old version of verifiability also included a section about reliable sources, and offered a classic formulation which I still regard as eminently valid:

"Sometimes a particular statement can only be verified at a place of dubious reliability, such as a weblog or tabloid newspaper. If the statement is relatively unimportant, then just remove it - don't waste words on statements of limited interest and dubious truth. However, if you must keep it, then attribute it to the source in question. For example: 'According to the weblog Simply Relative, the average American has 3.8 cousins and 7.4 nephews and nieces.' "

I'm sure that there is more to be learned from these old versions of policy. I would encourage everyone who is interested in this subject to check out a history page for yourself: perhaps you'd like to view the policies as they existed when you first edited, or perhaps you'd like to delve even deeper into the past than that.

The historical development of policy can offer great insights into how it can be developed in the future.

Sunday, 25 March 2007

Chunky or smooth?

While thinking about the current discussion about the proposal to merge several of Wikipedia's content policies into a new policy, Attribution, my gut feeling was that the core content policies (verifiability, no original research and neutral point of view) are better off treated as separate concepts that are nevertheless to be applied in conjunction with one another, rather than to try to join some of the concepts together. As I thought about it, the best reason I could think of to explain this reaction was to do with the way that the concepts operate in different ways, and thusly, how they need their space in order to operate properly.

The verifiability policy, as it is currently called, focuses on discrete "chunks" of content. The basic idea is that it should be possible for any reader to find the material in an extant reliable source. I'll call this the chunky level. The neutral point of view (NPOV) policy, on the other hand, operates at a higher level: it is concerned with what is done with these verifiable chunks of material, how they are put together. The core concept there is that, looking at the final article, all significant views on a subject should be presented fairly, in accordance with their prevalence (that is, not giving undue weight to any given view). The neutral-ness of individual chunks isn't important, rather the overall impression. I'll call this the smooth level.

The prohibition on original research sits somewhere in between these two in terms of the way in which it operates. It applies to individual "chunks" of content, in that each must not be original thought, but it also works on a broader level by prohibiting original research by synthesis. It's not a small picture or big picture thing: it's everywhere, at every level, from every angle.

There is undoubtedly some overlap between the policies on verifiability, NPOV and no original research, but I don't think that that's inherently a problem, nor do I think that when it becomes a problem that problem can be solved by merging the policies, because the policies operate in different ways.

Merging the NPOV and no original research policies, say, would lessen the force of the prohibition on "chunk-style" original research by focusing on the overall picture painted by the chunks when put together. Similarly, merging the verifiability and no original research policies - the thrust of the attribution proposal - lessens the force of the prohibition on original research by synthesis by focusing on the "chunk" level and not in the way that the chunks are used.

I think that the best way forward would be to merge elements of the policies and guidelines on sourcing into the verifiability policy, and rename that the "attribution" policy (the name "verifiability" is often misunderstood), and to maintain the other core content policies separately. Naturally, where overlap or bloat becomes a significant problem, then the policies need to be trimmed, and I think this is where efforts need to focus from now on.

Thoughts For Deletion