Thoughts For Deletion

Hollywood v the Internets

2010-02-04T19:07:00.004+11:00

Australian copyright law has a new landmark decision as of this morning, with Justice Cowdroy of the Federal Court of Australia handing down his decision in the Roadshow Films v iiNet Limited case, in which the misleadingly-named Australian Federation Against Copyright Theft (AFACT) sued iiNet, Australia's third-largest ISP, alleging copyright infringement. The case is significant in several ways both for ISPs and for operators of Internet services in Australia.

AFACT is a consortium of Hollywood movie studios who alleged that iiNet customers infringed copyrights owned by them in certain films by distributing copies via the BitTorrent file sharing protocol, and that iiNet itself had infringed by authorising its customers' infringements. AFACT had engaged an anti-piracy software firm to track the transmission of films over BitTorrent by IP addresses allocated to iiNet, and had then sent notices to iiNet warning them of the infringements and requesting that the ISP take action against the customers concerned. iiNet argued that it had not authorised any infringements. It also argued that privacy provisions in telecommunications legislation prevented it from acting upon any notices sent to it, and alternatively that it was protected from litigation by safe harbour provisions in copyright legislation.

Cowdroy J today held that while iiNet customers had infringed copyrights owned by AFACT members, iiNet had not authorised these infringements, for three reasons:

that one can distinguish "the provision of the 'means' of infringement compared to the provision of a precondition to infringement";
that any scheme for acting on AFACT notices would not constitute a relevant power or a reasonable step available to prevent infringement (within the meaning of s 101(1A) of the Copyright Act, which sets out factors that must be considered in assessing authorisation); and
that iiNet did not sanction or approve of copyright infringement by its customers.

Cowdroy J held that the means of infringement in this situation was the BitTorrent system (the protocol, trackers and clients) and not iiNet's network, thus distinguishing classic authorisation cases such as University of New South Wales v Moorhouse (involving a university library that provided photocopiers for the use of library patrons) as well as more recent Internet-centric cases such as Universal Music v Sharman Licence Holdings (in which Sharman was found to have authorised infringements via its Kazaa file-sharing software, with which Sharman both refrained from preventing infringement and actively encouraged infringement).

Distinguishing in this way the ultimate means from mere preconditions injects some clarity into the test for authorisation, which has largely revolved around degrees of control and of encouragement (Cowdroy J's second and third reasons mentioned above go to this classic test). This approach was obviously advantageous for iiNet. However, for operators of services such as wikis and social-networking sites, this approach would seem to render it more likely that they would be found to be authorising any copyright infringements by users, by providing the means of infringement such as a file upload facility or the ability to edit pages.

Without authorisation AFACT's case thus failed, however Cowdroy J went on to consider iiNet's other arguments in its defence anyway, in the event of an appeal (which would seem highly likely). He held that iiNet would not have been protected by s 112E of the Copyright Act, which protects telecommunications providers from being held to authorise infringement merely through providing the telecommunications service used to carry out the infringement. However, he found that iiNet would have been protected by the safe harbour provisions in the Copyright Act (s 116AA ff) because it had a "reasonably implemented" policy for dealing with repeat infringers.

These safe harbour provisions were based on the United States' OCILLA safe harbour provisions, although while the American provisions extend to "online service providers" (including website operators) the Australian ones are limited to "carriage service providers", that is, ISPs themselves. To my knowledge this is the first case to seriously address these provisions, and Cowdroy J notably utilised American OCILLA jurisprudence in doing so. Thus it seems that the safe harbour provisions will provide reasonably strong protections for ISPs, although with the current form of the legislation, this is of little comfort to online service providers.

The decision is significant in the context of Australian copyright law, and will be a boon for ISPs operating in Australia. However, for online service providers (such as operators of wikis), the substance of the decision will only serve to underline their precarious legal position in Australia, as opposed to their American counterparts, when it comes to copyright infringement by users of their services. They are not protected by safe harbours, and a "means"-based test for authorisation may well be worse than the more traditional control/encouragement test, if indeed it replaces it (it may merely augment it).

The silver lining however may be in Cowdroy J's rhetoric. His discussions of AFACT's nature and objectives, of its arguments and trial conduct, and of its attempt essentially to foist upon iiNet a positive obligation to protect its members' copyright interests, are enlightening. Robert Corr extracts some choice quotes here. Following last year's even more significant landmark decision by the High Court of Australia in the epic IceTV case, there would seem to be a healthy desire, in certain quarters of the legal community, to reevaluate some of the more extremist trajectories in Australian copyright law.

What happens to unreferenced BLPs?

2010-01-30T13:45:00.000+11:00

Those of us who live other than under rocks will no doubt be aware of the latest controversy over Wikipedia's approach to biographies of living persons articles (BLPs), concerning the deletion last week of a large number of BLPs that had been tagged as being unsourced, and had not been edited for more than six months. The deletions sparked a giant administrators' noticeboard discussion, a request for arbitration and now a request for comments on how to proceed from here.

At the crux of the dispute is how seriously the project is to take the modified standards that it has adopted with respect to biographies of living persons.

Debates of this sort are usually run along inclusionist/deletionist lines, but really the more important philosophical dichotomy when it comes to BLPs is between eventualists and immediatists. Wikipedia on the whole favours an eventualist perspective - facilitated by the almost immeasurably large potential pool of labour out there - but the BLP policy is essentially a localised switch to immediatism: unsourced material needs to be sourced post-haste, or else removed.

Conceptually it's an elegant and attractive approach. But a major flaw with it is our attraction to eventualism. We just can't shake it off.

This category, and its many subcategories, tracks BLP articles that have been tagged as not having any sources. At the time of writing there are over 47,000 of them, some having been tagged as long ago as December 2006. Evidently any sense of urgency has passed those by. The backlogs mount until they approach the point where individual editors have difficulty comprehending the problem, let along working to address it. Frustration builds at the inevitable inertia, until something radical happens, like these mass deletions.

Is this view accurate? Is the problem of unsourced BLPs really out of hand? We can try to answer these questions by looking at the way the backlog has been managed.

Unfortunately, the data available for this purpose is somewhat limited. Database dumps older than the 20 September 2009 dump are currently not available due to maintenance. However that September dump, along with dumps from 28 November 2009 and 16 January this year (shortly before the deletions started), do offer three data points with which to commence.

The monthly subcategories from October 2006 to August 2009 inclusive were common to all three dumps. The total number of articles in these categories declined from 50,715 in September to 43,655 earlier this month, a 13.9% fall. However, over the same period, the total in all subcategories through December 2009 rose from 50,715 to 51,301, a 1.2% increase. At least over this period, new additions outweighed articles being removed from these categories.

It should be noted that some of these additions are due to articles that had been tagged, but were unsorted, being added into the monthly subcategories. In fact, ten of the thirty-five subcategories common to all three dumps saw increases in numbers since September. The following graph shows the change in the monthly category totals over the roughly four months between the September and January dumps:

Without analysing the actual changes in the lists of articles in these subcategories it won't be possible to tell whether the sorting process is merely outweighing the normal reductions through articles being referenced or deleted, or, as I suspect, if there are genuinely fewer reductions in these subcategories that are no longer recent, but not yet the oldest. This can be the subject of further inquiry.

What we can say now is that the total number of unreferenced BLPs is now showing real decline for at least the first time in four months, possibly longer. It seems to have been the shock of mass deletions that has spurred people into action either to fix or delete these articles. Hopefully the shock will last long enough for a significant reduction to be achieved.

WikiReader

2009-10-13T23:56:00.000+11:00

Openmoko, a group which produces and distributes an open-source mobile phone environment, as well as phones to run it, has released the WikiReader, a dedicated device for reading Wikipedia. The WikiReader has a 240 by 200 pixel touchscreen and uses a compressed, text-only version of Wikipedia stored on a microSD card. Users can subscribe to receive quarterly updated copies on a new microSD card, or download the updates for free.

There are many implementations out there for reading Wikipedia on mobile devices, but to my knowledge this is the first dedicated Wikipedia reading device. However, beyond the inherent simplicity that a dedicated device provides, it's difficult to see many advantages to the WikiReader over other options.

One of the major advantages of Wikipedia is its up-to-the-minute coverage, and as an offline device (even with quarterly updates) the WikiReader loses this advantage. Mobile online access to Wikipedia has not been the best in the past, but the Wikipedia mobile portal has received plenty of tender loving development recently and is now quite decent, even on older devices. Aside from this mobile web interface, there are also dedicated Wikipedia reading apps for devices such as the iPhone.

Naturally not everyone has mobile internet access, or is always in a location where it is available, so offline methods are essential for many people. But there are plenty of implementations available for other devices, such as Encyclopodia, for the iPod family, or a TomeRaider format Wikipedia ebook.

Of course, the convenience of Wikipedia has been central to its success, and the convenience of a dedicated device may outweigh its disadvantages. It will be difficult for the WikiReader to succeed, however, when there is so much more flexible competition out there.

Arbitration Committee mail traffic

2009-07-01T13:10:00.002+10:00

Some brief traffic statistics on the Arbitration Committee's mailing list:

a total of 14692 messages were received by the list from January through June this year
an average of 81 messages were received each day
this is more than foundation-l (4473), wikien-l (4015) and wikitech-l (2924) combined over the same period, with change left over

Conclude from this what you will.

All Quiet on the Waziri Front

2009-06-29T14:44:00.000+10:00

There's an interesting piece in the New York Times today on investigative journalist David Rohde - who was kidnapped in Afghanistan last year and who escaped last week from his captors in Waziristan, in northern Pakistan - and the efforts to extend the media blackout on news of the kidnapping to his Wikipedia article.

The blackout was orchestrated by the New York Times Company and was said to have involved forty international news agencies, from NPR to al-Jazeera. NYT personnel "believed that publicity would raise Mr. Rohde's value to his captors as a bargaining chip and reduce his chance of survival", the story says, quoting Rohde's colleague Michael Moss as saying "I knew from my jihad reporting that the captors would be very quick to get online and assess who he was and what he’d done, what his value to them might be".

Along with staff at other news agencies, NYT personnel contacted Jimmy Wales too, who passed the matter along to a small group of administrators who reverted mentions of the kidnapping and protected the article a number of times over the following months. Michael Moss also apparently edited the article to emphasise Rohde's Pulitzer Prize-winning work on the Srebrenica massacre, as well as his work on Guantanamo Bay, believing that if his captors read the article they might view him as more sympathetic towards Muslims.

Jimbo acknowledges in the NYT piece that the matter was made easier by the lack of reliable sources reporting the kidnapping - a consequence of the blackout - which meant that the biographies of living persons policy could operate to keep any references to the kidnapping out of the article. The policy, of course, was originally intended to keep fabricated material out of articles, but it worked equally well to assist the blackout in this case.

The ethics of the blackout have come into question following Rohde's escape. NPR reported Poynter Institute journalism ethics lecturer Kelly McBride as saying "I find it a little disturbing, because it makes me wonder what else 40 international news organizations have agreed not to tell the public". Dan Murphy at the Christian Science Monitor says that the question of whether the press has a double standard in keeping quiet about their own while regularly reporting on other kidnappings will likely become part of the debate. Greg Mitchell, the editor of industry journal Editor & Publisher, details that organisation's internal debates and ultimate decision to adhere to the blackout. Mitchell raises a potential competing public interest argument, that information about events such as kidnappings in a certain area could, in some cases, help protect the public (though the average NYT reader doesn't hang out near Kabul that often - it might help protect other journalists though).

On the Wikipedia front, this is an interesting biographies of living persons case because every aspect of it involves journalists, who as a profession develop, apply and teach a whole suite of ethical principles governing their work, principles that many have suggested Wikipedia ought to adapt or learn from.

It's regularly true that hard cases make bad policy, and it is so here: the kidnapping was said to have been reported by an unnamed Afghani news agency, and apparently by Italian agency Adnkronos too; the existence of reliable sources on the matter (which I cannot verify due to absent or broken links) throws into doubt the legitimacy of enforcing the blackout on Wikipedia.

This may well put a wedge between two similar but distinct camps of support for the biographies of living persons policy: those who believe that such articles should be written from a "do no harm" perspective, and those who have a similar sympathy but only go so far as supporting a strict, immediatist adherence to ordinary content policy (instead of the typical eventualist stance), and no further.

New tools

2009-05-17T12:42:00.001+10:00

A couple of new tools I've put together that people might find some use for:

Admin activity statistics: shows some statistics on how many admins have used their tools at all over various timeframes, and on how many actions are taken by each active admin over various timeframes. Works on any Wikimedia project.
Per-page contributions: like [[Special:Contributions]], but shows contributions just to a particular page. Works on any Wikimedia project. I've already found it quite useful in several arbitration cases, especially for users who have made a large number of edits, or for pages which have been edited many times.

The image below is one of the graphs produced by the admin activity tool, it shows how many admins have performed at least one administrative action over various timeframes on the English Wikipedia:

More bug statistics

2009-04-20T23:55:00.003+10:00

Last November I put together some simple charts with the information from the weekly bug statistics that are automatically generated for the wikitech-l mailing list. There's now thirty-two weeks of data available, so here are some updated charts.

The distribution of resolution types seems to have stayed more or less the same over time, continuing the pattern seen in the original charts:

However, there are some changes in the other graph, which is based on information about the number of bugs each week. It shows the number of new, reopened, assigned and resolved bugs each week (using the scale on the left) and the total number of open bugs (in blue, using the scale on the right):

While there is still the same rough correlation between the number of new bugs and the number of bugs resolved each week, there is also a steady trend upwards in the total number of open bugs. Indeed, the total has risen nearly 20% since October last year.

So what are the consequences of so many bugs being opened but not dealt with? The following chart, generated by Bugzilla directly, shows the distribution of the "severity" parameter of all currently open bugs:

It shows that three-fifths of open bugs have severity given as "enhancement", essentially meaning that they're feature requests, entered into Bugzilla for tracking purposes, rather than being true bugs. A further 13% are marked "trivial" or "minor", and nearly a quarter "normal"; only 3% are "major".

So while the number of unresolved bugs is steadily rising, most of these are either feature requests or only minor bugs. Still, the backlog is fairly steadily getting worse, a reminder that it's constantly necessary for new volunteer developers to become involved with improving MediaWiki.

Parts of Wikipedia blacklisted in Australia

2009-03-19T19:19:00.000+11:00

The Australian Communications and Media Authority (ACMA) has added whistleblower website Wikileaks to its secret website blacklist. This comes after Wikileaks published a recent version of the blacklist, which includes Wikipedia pages, in addition to various religious websites and the site of a Queensland dentist.

In February an anti-censorship activist submitted a Wikileaks page (containing a copy of Denmark's secret blacklist!) to ACMA's online complaints facility, as a test of ACMA's guidelines. ACMA blacklisted that page, satisfied that it was "prohibited content" or "potential prohibited content" under the relevant legislation. However Wikileaks then published details of the report, including the correspondence, and then published a leaked copy of the ACMA blacklist from last August. Following this, ACMA blacklisted the entire Wikileaks site.

As of the time of writing, it does not seem possible to access Wikileaks from Australia, so I do not know what is on the leaked blacklist. But media reports indicate that, in addition to the intended targets of child porn sites, there is a substantial minority of other sites blacklisted, including some Wikipedia pages, YouTube videos, and online gambling sites, as well as a few bizarre examples in a tuckshop management company and an animal carer group.

The responsible minister, Senator Stephen Conroy, has denied that Wikileaks' list is the real thing, and one of the ISPs involved in the mandatory internet filtering trial has backed that up, saying that it is not the same as the list supplied to them recently.

Yet whether Wikileaks' list is accurate or not, the attention now being paid to the practices of ACMA in relation to the blacklist has at least exposed the risk to educational sites like Wikipedia posed by similar censorship systems. The ACMA blacklisting scheme is designed to dovetail with Australia's existing content classification system (for films, television etc) by defining "prohibited content" to mean content classified as RC (refused classification) or X 18+ by the Classification Board (and also R 18+ content to which unrestricted access is allowed, and under certain circumstances, M 15+ content).

This system has been criticised in a number of ways, not least because Internet content is subject to the film and television classification rules, rather than the rules for publications (with the result that, for example, a printed newspaper and a newspaper website showing the same material will be treated differently, depending on which version is classified first). Nevertheless, the Classification Board has extensive experience in content classification, and, as it is a singular organisation whose decisions are subject to review, is at least broadly consistent in its application of the guidelines.

The blacklisting scheme goes further, however, and allows ACMA to blacklist not only content which has actually been classified, but also "potential prohibited content", that is, unclassified content which it believes would ultimately be prohibited if it were classified. In practice, this means that ACMA bureaucrats - whose decisions are not subject to the same process of review, and are not even guaranteed to be made in the same way and applying the same process as the Classification Board - can blacklist sites if they think there is a "substantial likelihood" that the content would be prohibited.

Under the National Classification Code (PDF), classification not only depends on what the content depicts, but on the manner in which it is depicted. Relevantly for Wikipedia, educational materials covering subject matter like sexuality will likely be treated differently than other genres of material depicting the same subjects. With this parallel ACMA scheme, there is no guarantee of consistency, no guarantee the code will be correctly applied and no prospect of review. Thus, the public's access to legitimate educational content, such as Wikipedia articles, is subject to the whims of ACMA bureaucrats.

A related problem is that the ACMA blacklist is the basis of the aforementioned proposed mandatory internet filtering scheme in Australia, which aims to filter the Internet at the ISP level. Depending on the way such a scheme (if it is actually instituted, which seems unlikely at this time) is actually implemented by ISPs, we may end up with a situation in which access to Wikipedia is widely blocked, as happened recently in the UK.

Maryland court rejects identification subpoena

2009-03-08T23:27:00.000+11:00

Zebulon Brodie, a franchisee for Dunkin' Donuts, sued Independent Newspapers (operator of the Newszap.com classifieds and forums website) and three pseudonymous members of the site for defamation and conspiracy to defame, after the three participated in a forum thread in which the cleanliness of the store was critiqued.

The liability of Independent Newspapers (IN) was fairly easily resolved: the trial judge found that the company, as the provider of an "interactive computer service", could not be treated as the publisher or speaker of the forum postings due to s 230 of the Communications Decency Act, and as such could not be liable in defamation for the postings' contents. This provision has protected a range of service providers from liability for defamation and similar actions, including the Wikimedia Foundation itself.

However, the liability of the three pseudonymous users is a different story, and it was this issue that has been contentious in the case. The Newszap website required users to register before using the forums, and Brodie sought, by way of a subpoena, to compel IN to identify a total of five pseudonymous users who had participated in the forum thread. In turn, IN sought motions to quash the subpoena, and for a protective order to be issued; however, the trial judge rejected those motions, and ordered IN to identify the users.

The Maryland Court of Appeals overturned that order in a decision published this week (PDF). The basis for the decision was that three of the users did not make any comments that were actionable in defamation, and the other two, though they did make arguable actionable remarks, were not actually named as defendants in Brodie's original complaint (and by the time the case had proceeded to that point, any action against the two was barred by limitations provisions).

Though the case was thus resolved on an essentially procedural point, the Court of Appeals nevertheless went on to discuss the underlying question of when anonymous or pseudonymous users in such sitautions should be identified, and offered some guidance to lower courts.

All seven judges agreed on four steps that should be undertaken by courts considering defamation actions involving anonymous or pseudonymous defendants, where disclosure is sought:

require the plaintiff to make efforts to notify those defendants of any subpoena or application to disclose their identity - in the context of Internet forums, by posting a message there;
allow those defendants reasonable opportunity to oppose the application;
require the plaintiff to clearly identify the speech said to be actionable in defamation; and
determine whether the plaintiff has advanced a prima facie case against those defendants.

However, four judges comprising the majority went further, and added a fifth step that courts should undertake: if all the other requirements were satisfied, the court should weigh the strength of the prima facie case against the anonymous or pseudonymous defendants' First Amendment rights.

First Amendment jurisprudence concerning free speech has tended to recognise that an author's decision whether or not to disclose their identity may be protected as much as the content of their speech itself. In practice, this has translated into, for example, the Supreme Court of the United States striking down a local council ordinance forcing anyone soliciting door-to-door (in that case, Jehovah's Witnesses) from identifying themselves and obtaining a permit before doing so. While anonymity, like any other aspect of the right to free speech, does not protect speech which is defamatory, the majority were keen to point out that anonymous or pseudonymous posters have a right "not to be subject to frivolous suits for defamation brought solely to unmask their identity." In their view, the additional balancing test, beyond the prima facie requirement, was necessary to give adequate protection to this right. A lower standard of protection, in their view, "would inhibit the use of the Internet as a marketplace of ideas, where boundaries for participation in public discourse melt away."

The three judges who dissented as to the need for the balancing test were of the view that the prima facie requirement provided sufficient protection of First Amendment rights, given that they are already taken account of in the ordinary law of defamation. Judge Adkins, writing for the minority, cautioned that "the majority decision invites the lower courts to apply, on an ad hoc basis, a 'superlaw' of Internet defamation that can trump the well established defamation law."

The case is an interesting example of the way in which computer services providers who are protected by section 230 nevertheless have a significant role to play in legal processes that reach past them to target users of their services. The court also placed emphasis on ensuring that anonymous or pseudonymous users have an opportunity to participate in legal processes before their identity is disclosed. As a consequence, providers are not merely passive targets for subpoenas, nor must they be zealous defenders of all users of their services; rather, they have an important mediative role.

Small screen$

2009-01-22T20:02:00.001+11:00

Correction

Angela subsequently provided a correction for this post; the error was in the cited publication:

"The new platform has nothing to do with Nokia as far as I know. I've sent a correction to the author of that article. The Foundation already has a branding deal with Nokia but that's not related to this."

Angela Beesley (former member of the Wikimedia Foundation Board of Trustees, and current chair of the Advisory Board), has indicated during a talk at Linux.conf.au that the Foundation will be announcing a new mobile platform for Wikipedia later in the year. According to ZDNet Australia, the platform is currently under development and will be licensed to Nokia.

There are already a number of iPhone apps for reading Wikipedia, whether online and offline, including plenty of good free apps. In addition to dedicated apps, there is a solid specialised Wikipedia mobile interface, and there have been efforts to make the regular web interface of Wikimedia wikis more practicable in mobile browsers.

It's good news, however, to hear that a dedicated Wikipedia interface is on its way for one of the closed mobile platforms. What's also interesting is to hear that the platform will be licensed to Nokia, which makes it sound as if there's some commercial arrangement involved. It's interesting in that Angela's talk also touched on the recent success of the fundraiser, and the possibility of alternative sources of funds, such as selling physical versions of content like books and posters.

Licensing a branded mobile platform strikes me as an interesting potential revenue stream. It reminds me of Mozilla's arrangement with Google: something that benefits users, and also allows money to be made without compromising on principles (unlike, say, introducing ads).

The human touch

2008-11-25T00:47:00.000+11:00

Google has recently released SearchWiki, a set of tools for annotating Google search results. It's a rather dramatic change to the main search page, accesible by anyone logged into a Google account.

The interesting thing is that it seems like a personal version of Wikia Search (personal in the sense that only your alterations change the order of results, although you can see comments from everyone), though earlier comparisons were made more to link sharing sites like Digg.

So, might the emergence of this tool mean that Google is not, unlike so many others, underestimating the potential of Wikia Search?

Bug statistics

2008-11-18T19:08:00.000+11:00

Since the beginning of September, the bug tracker for MediaWiki has been sending weekly updates to the Wikitech-l mailing list, with stats on how many bugs were opened and resolved, the type of resolution, and the top five resolvers for that week. With eleven weeks of data so far, some observations can be made.

The following graph shows the number of new, resolved, reopened and assigned bugs per week (dates given are the starting date for the week). The total number of bugs open that week is shown in blue, and uses the scale to the right of the graph:

The total number of open bugs has been trending upwards, but only marginally, over the past couple of months. It will be interesting to see, with further weekly data, where this trend goes.

It also seems that the number of bugs resolved in any given week tends to go up and down in tandem with the number of new bugs reported in that week. Although there is no data currently available on how quickly bugs are resolved, I would speculate that most of the "urgent" bugs are resolved within the week that they are reported, which would explain the correlation.

Note also the spike in activity in the week beginning 6th October; this was probably the result of the first Bug Monday.

The second graph shows the breakdown of types of bug resolutions:

The distribution seems fairly similar week on week, with most resolutions being fixes. It's interesting to note that regularly around 25% to 35% of bug reports are problematic in some way, whether duplicates or bugs that cannot be reproduced by testers.

The weekly reports are just a taste of the information available about current bugs; see the reports and charts page for much more statistic-y goodness. And kudos to the developers who steadily work away each week to handle bugs!

How Collective Wisdom Shapes Business, Economies, Societies and Nations (and Wikipedia articles)

2008-08-31T14:18:00.001+10:00

Alaska Governor Sarah Palin was selected as presumptive Republican presidential candidate John McCain's running mate on Friday, and her Wikipedia article has seen a predictable explosion in editing activity. From the article's creation in 2005, up until the announcement on Friday, the article had been edited something like 900 times. Since then, however, it's been edited nearly 2000 times again.

What's more interesting is how the article was edited before the announcement was made. Ben Yates mentions this NPR story detailing edits made to the page by a user called Young Trigg, who may or may not have been Palin herself (or someone on her staff). But Young Trigg was not the only person editing the article.

The Washington Post reports on some analysis done by "Internet monitoring" company Cyveillance, which found that Palin's article was edited more heavily in the days leading up to the announcement than any of the articles on the other prospects for the nomination. A similar pattern emerged in relation to the articles on the frontrunners for the Democratic vice-presidential nomination: Joe Biden's article was edited more heavily than the other potential picks in the leadup to his selection as Obama's running mate last week.

Also similar were the types of edits being made: both Palin and Biden's articles saw many footnoting and other accuracy-type edits in the leadup to the announcements of their selection. As a final piece of intrigue, the editors making these edits about Palin and Biden were far more likely to also be actively editing McCain and Obama's articles respectively than were the editors editing articles on the other potential nominees.

There are at least two explanations for these patterns. The first is that the two campaigns, knowing full well who the nominees would be, were editing the articles in advance of the announcement to ensure that they were accurate (or to take the cynical view, to ensure that they were favourable), knowing full well that Wikipedia would be one of the major sources of information for the public - and for journalists and campaign staff too - following the announcements.

The alternative is more interesting, to my mind. Cyveillance, who did the analysis, is usually in the business of data mining in the business world, aiming to collate disparate sources of public information to predict financial and commercial events before they are publicly announced. Wikipedia may be performing exactly the same function: a variety of editors collating disparate pieces of information in a far more powerful way than any individual could. It's already (un)conventional wisdom that the betting markets are equal or better predictors of elections than opinion polls are: a basic application of the efficient market hypothesis. In a similar way, high profile, highly edited Wikipedia articles like these are the marketplace of the information economy.

Userpage Google envy

2008-08-23T16:35:00.001+10:00

Brianna Goldberg, a Canadian journalist with the National Post, wrote on Friday about her efforts to become the number one Google result for her name. Her quest was sparked by discovering that the Wikipedia user page of another Brianna Goldberg was ensconced in the top spot.

The journalist Goldberg obtained advice from search engine optimization experts on methods for advancing her ranking, but still had difficulty displacing the userpage. Moreover, the article on the journalist comes in second to the userpage in results from Wikipedia. Wikipedia user pages are certainly highly visible: every time you sign an edit, you're creating a link to your user page.

This relates to a discussion from last month on the mailing list about whether user pages (and certain other types of pages) should be indexed by search engines at all. The Wikimedia sites already instruct search engines not to crawl certain pages, including deletion debates, requests for arbitration pages and requests for adminship, but there have regularly been calls for more types of pages to be restricted (see here for example).

So, should user pages be blocked from search engine crawlers?

US court groks free content licensing

2008-08-18T01:18:00.000+10:00

The US Court of Appeals for the Federal Circuit handed down an interesting and significant decision on Wednesday, which could have a number of valuable implications for the validity of free content licences.

The case, Jacobsen v Katzer, was about software for interfacing with model trains. Robert Jacobsen is the leader of the Java Model Railroad Interface project (JMRI), which releases its work under the Artistic License 1.0; Matthew Katzer (and his company Kamind Associates) produce commercial model train software products. It was alleged that either Katzer or another employee of Kamind took parts of the JMRI code and incorporated it into its own software, without identifying the original authors of the code, including the original copyright notices, identifying the JMRI project as the source of the code, or indicating how it had modified the original JMRI code.

Jacobsen sought an interlocutory injunction, arguing that since Katzer and Kamind had breached the Artistic License, their use of the JMRI code constituted copyright infringement. However, the District Court considered that Jacobsen only had a cause of action for breach of contract, not for copyright infringement, and because of this Jacobsen could not satisfy the irreparable harm test (in the case of copyright infringement, irreparable harm is presumed in the 9th Circuit), and was not entitled to an injunction.

Jacobsen's appeal to the Court of Appeals was against this preliminary finding. An assortment of free content bodies (including Creative Commons and the Wikimedia Foundation) appeared as amici curiae in the case, submitting an interesting brief containing a number of arguments that the Court of Appeals seemed to agree with.

The legal issue at stake in the appeal concerned the difference between conditions of a contract and ordinary promises (covenants, in US parlance). If a term in a contract is a condition, then the promisee has a right to terminate the contract. In the context of a copyright licence, if someone using the licensed material breaches a condition of the licence, they are then open to a copyright infringement action (unless they have some other legal basis for using the material). Contract law will still hold someone responsible for breaching a contractual promise, but the remedies are different, and as was the issue here, it's much harder to get an interlocutory injunction.

Whether or not a term is a condition is a matter of construction, and depends on the intention of the parties. In answering the question of whether the relevant terms were conditions, the Court of Appeals made a number of important observations which are applicable to free content licences generally.

The first observation was that, just because with free content licensing there is no money changing hands, it is not the case that there can be no economic consideration involved. The Court recognised several other forms of economic benefit which free content licensors derive from licensing their works:

"There are substantial benefits, including economic benefits, to the creation and distribution of copyrighted works under public licenses that range far beyond traditional license royalties. For example, program creators may generate market share for their programs by providing certain components free of charge. Similarly, a programmer or company may increase its national or international reputation by incubating open source projects."

This is a really significant observation for the court to make, because there are some major ideological barriers that seemed to get in the way of the District Court on this point. Even though free content licencing is all about authors dealing with their economic rights under copyright, free content is all too often viewed as non-economic. Just because free content doesn't fit in with the traditional royalties-based system, it does not mean that there are not real economic motives involved.

The second observation was made in the context of the general rule (applicable in that jurisdiction) that an author who grants a non-exclusive licence effectively waives their right to sue for copyright infringement. If the relevant terms were conditions, then they would be capable of serving as limitations on the scope of the licence, which would negate this rule. The Court said that:

"[t]he choice to exact consideration in the form of compliance with the open source requirements of disclosure and explanation of changes, rather than as a dollar-denominated fee, is entitled to no less legal recognition."

Again, this seems to be an important point in terms of getting over psychological hurdles. The District Court was clearly hung up on the terms in the Artistic License allowing users to freely distribute and modify licensed material; it focused on the breadth of the freedoms granted. In doing so it overlooked that while the License did grant broad freedoms, it clearly circumscribed them. The Court of Appeals understood what the District Court did not: that releasing material under a free licence is not the same as giving it away.

The heart of the decision was of course about the particular wording in the Artistic License. The use of the phrase "provided that" in the Artistic License was significant, because such wording usually indicates a condition under Californian contract law. Further, the requirement that any copies distributed be accompanied by the original copyright notice - a relatively common term - also typically indicates a condition.

In the end, the Court of Appeals decided that the relevant terms were conditions, and that Jacobsen had a copyright infringement action open to him. Since the District Court didn't assess Jacobsen's prospects of success on the merits, the Court of Appeals remanded the injunction application back to them for their consideration. Given that Katzer and Kamind apparently conceded that they did not comply with the Artistic License, Jacobsen would seem a good chance to get his injunction, and later to succeed at the merits stage.

Though much turned on the particular wording here, the reasoning behind the assessment of the terms can easily be applied to other free content licences, as can the recognition of the economic motives involved in free content licencing, motives which though non-traditional, are both legitimate and worthy of protection by the law. Independent of any value as a binding precedent, this case is a magnificent example of a court really appreciating the vibe of free content.

Kno contest

2008-07-28T22:36:00.001+10:00

Google's Knol was opened to the public last week, to much fanfare. When Knol was announced in December last year, it was immediately compared to Wikipedia, and the comparisons keep coming now that it has launched. However, as I wrote at the time, the comparison seemed to be wide of the mark in many important ways. Now that Knol has launched and we can see how it will actually work, I think the accuracy of the comparison is still not borne out.

The three key differences I noted at the time were the lack of collaboration in writing knols, the plurality of knols (more than one on the same subject) and that knols will not necessarily be free content, differences which go to the core of what makes Wikipedia what it is.

As it turns out, Knol does provide a couple of options for collaboration, allowing authors to moderate contributions from the public, or allow public contributions to go live immediately, wiki-style. The other mode is closed collaboration, but it does allow for multiple authors at the invitation of the original author.

As the sample knol hinted, Knol does provide for knols to be licensed under the CC-BY 3.0 licence by default, and allows authors to choose the CC-BY-NC 3.0 licence, or to reserve all rights to the content. However, these are the only licences available; in particular, no copyleft licences are available.

Of course, the thing to remember is that Knol is an author-oriented service, so even if an author selects open collaboration and the CC-BY licence, it appears that they can change their minds at any time, and, for example, close collaboration on a previously open knol (I might need to do some closer reading of the terms of service, but it would also appear possible to revert the Knol-published version to all rights reserved model, too).

The author-oriented approach is apparent in most of the features of Knol. On a knol's page you don't see links to similar knols, or knols on related topics (as you would on a Wikipedia article) you see links to knols written by the same author. Knols aren't arranged with any kind of information structure like Wikipedia categories, or even tags; the URLs are hierarchical, but there knols are gathered under the author's name.

No, Knol is not a competitor to Wikipedia (or at least, it's competing for a different market segment). It's more a companion to another Google property, Blogger. It's a publishing platform, but not for diary-style, in-the-moment transient posts; it's for posts that are meant to be a little more timeless, one-off affairs. Google say so at their Blogger Buzz blog:

"Blogs are great for quickly and easily getting your latest writing out to your readers, while knols are better for when you want to write an authoritative article on a single topic. The tone is more formal, and, while it's easy to update the content and keep it fresh, knols aren't designed for continuously posting new content or threading. Know how to fix a leaky toilet, but don't want to write a blog about fixing up your house? In that case, Knol is for you."

Some of the content on Knol might start off looking like Wikipedia articles, but over time I'll bet that the average "tone" of knols will find a middle ground between blogs and Wikipedia's "encyclopaedic" tone as people come to use Knol as a companion to blogging.

Rambot redux

2008-06-02T02:26:00.000+10:00

FritzpollBot is a name you're likely to be hearing and seeing more of: it's a new bot designed to create an article on every single town or village in the world that currently lacks one, of which there are something like two million. The bot gained approval to operate last week, but there's currently a village pump discussion underway about it.

FritzpollBot has naturally elicited comparisons with rambot, one of the earliest bots to edit Wikipedia. Operated by Ram-Man, first under his own account and then under a dedicated account, rambot created stubs on tens of thousands of cities and towns in the United States starting in late 2002.

It's hard for people now to get a sense of what rambot did, but its effects even now can be seen. All told, rambot's work represented something close to a doubling of Wikipedia's size in a short space of time (the bulk of the work, more than 30,000 articles, being done over a week or so in October 2002). The noticeable bump that it produced in the total article count can still be seen in present graphs of Wikipedia's size. Back then the difference was huge. I didn't join the project until two years after rambot first operated, but even then around one in ten articles had been started by rambot, and one would run into them all the time.

During its peak, rambot was adding articles so fast that the growth rate per day achieved in October 2002 has never been outstripped, as can be seen from the graph below (courtesy Seattle Skier at Commons):

There was some concern about rambot's work at the time: see this discussion about rambot stubs clogging up the Special:Random system, for example. There were also many debates about the quality and content of the stubs, many of which contained very little information other than the name and location of the town.

The same arguments that were made against rambot at the time, mainly to do with the project's ability to maintain so many new articles all at once, are being made again with respect to FritzpollBot. In the long run, the concerns about rambot proved to be ill-founded, as the project didn't collapse, and most (if not all) of the articles have now been absorbed into the general corpus of articles. The value of its work was ultimately acknowledged, and now there are many bots performing similar tasks.

In addition to the literal value of rambot's contributions, there's a case to argue that the critical mass of content that rambot added kickstarted the long period of roughly exponential growth that Wikipedia enjoyed, lasting until around mid-2006. I don't think it's unreasonable to suggest that having articles on every city or town in the United States, even if many were just stubs, was a significant boon for attracting contributors. From late 2002 on, every American typing their hometown or their local area into their favourite search engine would start to turn up Wikipedia articles among the results, undoubtedly helping to attract new contributors. The stubs served as a base for redlinks, which in turn helped build the web and generate an imperative to create content. Repeating the process for the rest of the world, as FritzpollBot promises to do, would thus be an incredibly valuable step.

Furthermore, as David Gerard observes, when rambot finished its task the project had taken its first significant step towards completeness on a given topic. Rambot helped the project make its way out of infancy; now in adolescence, systemic bias is one of the major challenges it faces, and hopefully FritzpollBot can help existing efforts in this regard. Achieving global completeness across a topic area as significant as the very places that humans live would be a massive accomplishment for the project.

Let's see those Ws really cover the planet.

Wikipedia to be studied in New South Wales from 2009

2008-05-26T12:07:00.005+10:00

The Board of Studies in the Australian state of New South Wales, which sets the syllabus for high school students across the state, has included Wikipedia as one of the texts available for study in its "Global Village" English electives, according to The Age.

The new syllabus will apply from 2009-2012, and (certain selected parts of) Wikipedia will be one of four texts available in the elective. It will be up to teachers to choose which text is studied, so there are no guarantees that Wikipedia will actually be studied in New South Wales :) According to the syllabus documentation (DOC format), the other alternatives are the novel The Year of Living Dangerously, about the downfall of Sukarno and the rise of Suharto in Indonesia in 1965; the play A Man with Five Children; and the modern classic film The Castle.

I think formalised educational study of Wikipedia is going to be very important in the future, as the reality of its success and its widespread use coincides with a long period of neglect of skills in critically evaluating source material in many schools, certainly in this country. Thankfully the people at the Board of Studies seem to get this. There's also a good quote from Greg Black at the non-profit educational organisation education.au:

"The reality is that schools and schools systems are going to have to engage with this whether they like it or not... what the kids really need to learn about is whether it's fit for purpose, the context, the relevance, whether there's an alternative view - an understanding about how to use information in an effective way."

And, just for good measure The Age article features a quote from Privatemusings, exhorting students to "plug in". Indeed, some good advice.

Citationschadenfreude

2008-04-26T01:12:00.003+10:00

Oh dear. Queensland-based Griffith University has been copping flak over the past few days for asking the government of Saudi Arabia to contribute money towards its Islamic Research Unit, having previously accepted a smaller grant last year. One state judge branded the university "an agent of extreme Islam".

The university's vice-chancellor Ian O'Connor defended accepting the grant and seeking further money, but was today busted by The Australian newspaper for lifting parts of his defence from Wikipedia's article on Wahhabism. Worse, the change that was made to the copied text rendered it inaccurate. The irony that O'Connor had previously gone on the record recommending that Griffith students not use Wikipedia was not lost on the media.

The good news is that while there's been plenty of criticism from journalists and commentators of O'Connor, both for copying in the first place and then for introducing a pretty dumb mistake with his change, there've been no reported problems with the Wikipedia material O'Connor copied. Furthermore, the copy of O'Connor's response on the Griffith website now properly quotes and footnotes Wikipedia. Score one to us, I think :)

Wikipedia's downstream traffic

2008-03-29T00:13:00.004+11:00

We've been hearing for a while about where Wikipedia's traffic comes from, but here are some new stats from Heather Hopkins at Hitwise on where traffic goes to after visiting Wikipedia. Hopkins had produced some similar stats back in October 2006, and it's interesting to compare the results.

Wikipedia gets plenty of traffic from Google (consistently around half) and indeed other search engines, but what's interesting is that nearly one in ten users go back to Google after visiting Wikipedia, making it the number one downstream destination. Yahoo! is also a popular post-Wikipedia destination.

It was nice to see that Wiktionary and the Wikimedia Commons both make it into the top twenty sites visited by users leaving Wikipedia.

Hopkins also presents a graph illustrating destinations broken down by Hitwise's categories. More than a third of outbound traffic is to sites in the "computers and internet" category, and around a fifth to sites in the "entertainment" category, which probably ties in with the demographics of Wikipedia readers, and the general popularity of pop culture, internet and computing articles on Wikipedia.

Hopkins makes another interesting point on the categories, that large portions of the traffic in each category are to "authority" sites:

"Among Entertainment websites, IMDB and YouTube are authorities. Among Shopping and Classifieds it's Amazon and eBay. Among Music websites it's All Music Guide For Sports it's ESPN. For Finance it's Yahoo! Finance. For Health & Medical it's WebMD and United States National Library of Medicine."

Similarly, Doug Caverly at WebProNews states that the substantial proportion of traffic returning to search engines after visiting Wikipedia "probably indicates that folks are continuing their research elsewhere", and this ties in well with Hopkins' observation about the strong representation of reference sites.

All of this suggests that Wikipedia is being used the way that it is really meant to be used: as a first reference, as a starting point for further research.

Protection and pageviews

2008-03-17T01:13:00.001+11:00

Henrik's traffic statistics viewer, a visual interface to the raw data gathered by Domas Mituzas' wikistats page view counter, has generated plenty of interest among the Wikimedia community recently. Last week Kelly Martin, discussing the list of most viewed pages, wondered how many page views are of protected content; that thought piqued my interest, so I decided to dust off the old database and calculator and try to put a number to that question.

The data comes from the most viewed articles list covering the period from 1 February 2008 to 23 February 2008. I've used that data, and data on protection histories from the English Wikipedia site, to come up with some stats on page protection and page views. There are some limitations: I don't have gigabytes of bandwidth available, so some of the stats (on page views in particular) are estimates, and protection logs turn out to be pretty difficult to parse, so I've focused on collecting duration information rather than information on the type of protection (full protection, semi-protection etc). Maybe that could be the focus of a future study.

There were 9956 pages in the most viewed list for February 1 to February 23 2008. Excluding special pages, images and non-content pages, there were 9674 content pages (articles and portals) in the list. Interestingly, only 3617 of these pages have ever been protected, although each page that has been protected at least once has, on average, been under protection nearly three times.

Protection statistics

Only 1223 (12.6%, about an eighth) of the pages were edit protected at some point during the sample period, 902 of those for the entire period (a further 92 were move protected only at some point, 69 of those for the entire period). Each page that had some period of protection was protected for, on average, 82.9% of the time (just under 20 days), though if the pages protected for the whole period are excluded, the average period spent protected was only 34.8% of the time (just over eight days).

The following graph shows the distribution of the portion of the sample period that pages spent protected, rounded down to the nearest ten percent:

The shortest period of protection during the period was for Vicki Iseman, protected on 21 February by Stifle, who thought better of it and unprotected just 38 seconds later.

Among the most viewed list for February, the page that has been protected the longest is Swastika, which has been move protected continuously since 1 May 2005 (more than 1050 days). The page that has been edit protected the longest is Marilyn Manson (band), which has been semi-protected since 5 January 2006 (more than 800 days).

Interestingly, the average length of a period of edit protection across these articles (through their entire history) is around 46 days and 16 hours, whereas the average length of a period of move protection is lower, at 41 days 14 hours. I had expected the average bout of move protection to last longer, although almost all edit protections do include move protections.

The next graph shows the distribution of protection lengths across the history of these pages, for periods of protection up to 100 days in length (the full graph goes up to just over 800 days):

Note the large spikes in the distribution at seven and fourteen days, the smaller spike at twenty-one days and the bump from twenty-eight to thirty-one days, corresponding to protections of four weeks or one month duration (MediaWiki uses calendar months, so one month's protection starting January will be 31 days long, whereas one month's protection starting September will be 30 days long).

The final graph shows the average length of protection periods (orange) and the number of protection periods applied (green) in each month, over the last four years:

At least on these generally popular articles, protection got really popular towards the end of 2006 into the beginning of 2007, and again a year later. However, it seems that protection lengths peaked around the middle of 2007 and have been in decline since then.

Protection and pageviews

What really matters here though is the pageviews. The 9674 content pages in the most viewed list were viewed a total of 805,569,269 times over the relevant period. The 1223 pages that were edit protected for at least part of the period were viewed a total of 270,057,550 times (33.5%), with approximately 247 million of these pageviews coming while the pages were protected.

This is a really substantial number of pageviews, however, this number includes the Main Page, which alone accounts for more than 114 million of those pageviews. Leaving the Main Page out of the equation gives a healthier figure of around 133 million views to protected pages during the relevant period (and remember, this is only counting pages on the most viewed list).

Conclusions

Although only one in eight of the pages in the most viewed list were protected at some point during the relevant period, they tended to be higher-profile ones, accounting for one third of the page views. The pages that were protected at some point tended to be protected alot of the time, three-quarters of them for the entire sample period. This certainly fits with what many people have already suspected, that a small pool of high-profile articles attract plenty of attention in the form of both page protection and page views.

It will be interesting to do some more analysis on the history of page protection. Based on just this small sample, it seems that average protection lengths are trending downwards, which could well be something to do with the advent of timed protection. Hopefully I'll have some more insights to come.

Today's lesson from social media

2008-03-16T22:38:00.003+11:00

So much for claims that Jimmy Wales uses his influence to alter content for his friends: according to Facebook's Compare People application, Jimmy may be the best listener, the best scientist and the most fun to hang out with for a day, but he's nowhere to be seen in the list of people most likely to do a favour for me.

Beware corners

2008-02-20T03:13:00.001+11:00

I'm sure that everyone who follows the news around Wikipedia will be aware of the latest controversy to gain attention in the media, namely the dispute about the inclusion of certain images in Wikipedia's article on Muhammad. Much of the external attention has focused on an online petition that calls for the removal of the images which, at the time of writing, has more than 200,000 signatures.

The debate so far has been understandably robust. Unfortunately, issues like these tend to harden positions, and push people towards the extremes. Consider a recent example: the seventeen Danish newspapers who, in the wake of the arrest of three men suspected of planning to assassinate Kurt Westergaard, author of one of the cartoons at the heart of the Jyllands-Posten Muhammad cartoons controversy, republished the cartoons in retaliation.

Likewise, positions are being hardened in this debate among both supporters and opponents of the images. The relevant talk pages are remarkably free of comments (from either side) even contemplating compromise. The Foundation is receiving emails on the one hand giving ultimatums that the images be removed, and on the other exhorting the Foundation not to "give in" to "these muslims [sic]".

Retreating into corners like this is contrary to the ethos of Wikipedia, which operates on open discussion in pursuit of neutrality. So just as the supporters of the images are asking opponents to challenge their assumptions, so too should the supporters be prepared to challenge their own.

The first assumption that should be questioned is that the images are automatically of encyclopaedic value. Images have little value in an encyclopaedia unless used in a relevant context and given sufficient explanation. Take this image, for example. An interesting image, but unless it is explained that it appears in Rashid al-Din's 14th century history Jami al-Tawarikh, and the observation made that it is thought to be the earliest surviving depiction of Muhammad, it lacks its true significance. We have a whole article on depictions of Muhammad. While some of the images in it are discussed directly, many are merely presented in a gallery, without much text to indicate their importance or relevance.

The second assumption worth revisiting is the assumption that, since the images were created by Muslim artists, then there are no neutral point of view problems. This view overlooks the fact that there are many different traditions within Islam, not only religious ones but artistic ones also. The Almohads, for example, with their Berber and eventual Spanish influences, had vastly different cultural and artistic influences than the Mongol, Turkic and Persian influenced Timurids. The Fatimids of Mediterranean Africa had different influences again from the Kurdish Ayyubids.

The Commons gallery for Muhammad contains an abundance of medieval Persian and Ottoman depictions, a small handful of Western depictions, but only one calligraphic depiction, and no architectural ones. Calligraphy is extremely significant in Islamic art, given the primacy of classical Arabic as a liturgical language in all Islamic traditions. It's worth considering why there is such an over-representation of Persian and Ottoman works, and such a dearth of works from other Islamic traditions. It's worth considering for a moment whether the Western preference for natural representations, as opposed to the abstract representations preferred in most Islamic traditions, has informed the predominance of physical depictions of Muhammad in the English Wikipedia and on Commons.

These images should not be removed altogether; many come from historically significant works, and represent a significant artistic tradition. But the images - as with any other content on Wikipedia - ought to be used in appropriate and expected contexts, and ought not be used exclusively or primarily to illustrate these articles, but should be accompanied by images representative of other traditions.

Most of all, discussions on these questions should proceed openly and freely, and all participants should make an effort to question their assumptions, and move away from their corners.

Knol worries

2007-12-19T12:40:00.000+11:00

Last week Google announced an invite-only trial of a new tool called Knol (their name for a "unit of knowledge") to allow people to write an information page on a subject which can then be rated, reviewed or commented on by others. The central idea, as Google's VP of Engineering Udi Manber put it, is authorship:

"Books have authors' names right on the cover, news articles have bylines, scientific articles always have authors -- but somehow the web evolved without a strong standard to keep authors names highlighted. We believe that knowing who wrote what will significantly help users make better use of web content."

You can see a sample knol here (isn't everyone just itching to edit out that spelling mistake in the first sentence?).

Most of the press coverage of Knol is positing it as a competitor to Wikipedia, but is it really? We won't know what Knol will really be like until it is open to the public (presuming it makes it out of private beta), but from the looks of things it differs markedly from Wikipedia in all of the most important ways that Wikipedia is unique.

Firstly, knols won't be collaboratively written: Google says that the Knol platform will include "strong community tools", enabling the general unwashed to submit changes to knols (they use the name for individual articles too) as well as review, rate and comment on them, but ultimately the content of knols will be controlled by their original authors. Obviously, this is different from Wikipedia's collaborative wiki editing model under which no-one owns articles.

Secondly, there will likely be multiple knols on any given subject: as they say in the Knol announcement, it will be Google's job to appropriately rank knols in search results. Presumably they'll make use of the rating and reviewing tools in the platform as well as standard metrics like PageRank to try to work out which knol really is the most authoritative on a subject. Again, this is clearly different from Wikipedia, with its single-voice, neutral point of view system.

Thirdly, knols will not necessarily be free-content: while the sample knol mentioned above has a CC-BY 3.0 licence displayed on it, there are no indications that such licencing will be required, and Knol's "author control" vibe probably indicates that each author will get to choose the licence for their knols.

Kevin Newcomb at Search Engine Watch thinks a better comparison for Knol is Squidoo, and to an extent Mahalo, "since it allows users to build authority and sign their work [and aims] to build content pages that rank highly in search engines." Danny Sullivan at Search Engine Land also draws the comparison between Knol and Squidoo, and suggests that Knol is more likely an attempt by Google to carve out a niche of its own in the 'knowledge aggregation' industry rather than an effort to compete directly with any of the projects in the field.

However, it's perhaps best to think of the Knol proposal less as a project and more as a platform: Rafe Needleman at Webware compares Knol with Google's existing text publishing platform, Blogger, though with "Digg-like elements". Knol authors will build reputation, like blog authors (though Knol will be more about discrete articles rather than a stream of them), and users will rate and review competing knols in much the way that Digg and similar link-sharing sites operate. I think this is the best comparison, and fits well with the strong focus on individuality and authorship, and on Google's planned hands-off approach, in the Knol announcement. There's certainly a niche available for this kind of publishing.

So, presuming Knol goes public one day, it may well garner a significant slice of search results and a place in the knowledge business, but with its author-driven multiple-voice model and basis as essentially a publishing platform, it is more likely to be a complement to Wikipedia than a competitor.

Mr Wales Goes to Washington

2007-12-12T22:35:00.000+11:00

Jimmy Wales testified before the United States Senate Committee on Homeland Security and Governmental Affairs on Tuesday (Washington time) on collaborative technologies generally and the Wikimedia projects specifically, and how that relates to e-Government initiatives in the United States.

Among other things, Jimmy discussed the use of both internal and public-facing wikis for governmental communication, explained semi-protection to Joe Lieberman, and outlined the benefits of systems that are designed to be open rather than closed. Also testifying were Karen Evans from the US government's Office of Management and Budget, John Needham from Google, and Ari Schwartz from the Center for Democracy and Technology.

You can view video of the hearing on Youtube or download it here in Real format (be aware that it's two hours long), see Jimmy's prepared testimony here (PDF) or see other statements here.