Monday, 17 March 2008

Protection and pageviews

Henrik's traffic statistics viewer, a visual interface to the raw data gathered by Domas Mituzas' wikistats page view counter, has generated plenty of interest among the Wikimedia community recently. Last week Kelly Martin, discussing the list of most viewed pages, wondered how many page views are of protected content; that thought piqued my interest, so I decided to dust off the old database and calculator and try to put a number to that question.

The data comes from the most viewed articles list covering the period from 1 February 2008 to 23 February 2008. I've used that data, and data on protection histories from the English Wikipedia site, to come up with some stats on page protection and page views. There are some limitations: I don't have gigabytes of bandwidth available, so some of the stats (on page views in particular) are estimates, and protection logs turn out to be pretty difficult to parse, so I've focused on collecting duration information rather than information on the type of protection (full protection, semi-protection etc). Maybe that could be the focus of a future study.

There were 9956 pages in the most viewed list for February 1 to February 23 2008. Excluding special pages, images and non-content pages, there were 9674 content pages (articles and portals) in the list. Interestingly, only 3617 of these pages have ever been protected, although each page that has been protected at least once has, on average, been under protection nearly three times.

Protection statistics

Only 1223 (12.6%, about an eighth) of the pages were edit protected at some point during the sample period, 902 of those for the entire period (a further 92 were move protected only at some point, 69 of those for the entire period). Each page that had some period of protection was protected for, on average, 82.9% of the time (just under 20 days), though if the pages protected for the whole period are excluded, the average period spent protected was only 34.8% of the time (just over eight days).

The following graph shows the distribution of the portion of the sample period that pages spent protected, rounded down to the nearest ten percent:


The shortest period of protection during the period was for Vicki Iseman, protected on 21 February by Stifle, who thought better of it and unprotected just 38 seconds later.

Among the most viewed list for February, the page that has been protected the longest is Swastika, which has been move protected continuously since 1 May 2005 (more than 1050 days). The page that has been edit protected the longest is Marilyn Manson (band), which has been semi-protected since 5 January 2006 (more than 800 days).

Interestingly, the average length of a period of edit protection across these articles (through their entire history) is around 46 days and 16 hours, whereas the average length of a period of move protection is lower, at 41 days 14 hours. I had expected the average bout of move protection to last longer, although almost all edit protections do include move protections.

The next graph shows the distribution of protection lengths across the history of these pages, for periods of protection up to 100 days in length (the full graph goes up to just over 800 days):

Note the large spikes in the distribution at seven and fourteen days, the smaller spike at twenty-one days and the bump from twenty-eight to thirty-one days, corresponding to protections of four weeks or one month duration (MediaWiki uses calendar months, so one month's protection starting January will be 31 days long, whereas one month's protection starting September will be 30 days long).

The final graph shows the average length of protection periods (orange) and the number of protection periods applied (green) in each month, over the last four years:

At least on these generally popular articles, protection got really popular towards the end of 2006 into the beginning of 2007, and again a year later. However, it seems that protection lengths peaked around the middle of 2007 and have been in decline since then.

Protection and pageviews

What really matters here though is the pageviews. The 9674 content pages in the most viewed list were viewed a total of 805,569,269 times over the relevant period. The 1223 pages that were edit protected for at least part of the period were viewed a total of 270,057,550 times (33.5%), with approximately 247 million of these pageviews coming while the pages were protected.

This is a really substantial number of pageviews, however, this number includes the Main Page, which alone accounts for more than 114 million of those pageviews. Leaving the Main Page out of the equation gives a healthier figure of around 133 million views to protected pages during the relevant period (and remember, this is only counting pages on the most viewed list).

Conclusions

Although only one in eight of the pages in the most viewed list were protected at some point during the relevant period, they tended to be higher-profile ones, accounting for one third of the page views. The pages that were protected at some point tended to be protected alot of the time, three-quarters of them for the entire sample period. This certainly fits with what many people have already suspected, that a small pool of high-profile articles attract plenty of attention in the form of both page protection and page views.

It will be interesting to do some more analysis on the history of page protection. Based on just this small sample, it seems that average protection lengths are trending downwards, which could well be something to do with the advent of timed protection. Hopefully I'll have some more insights to come.

1 comments:

TV Digital said...

Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the TV Digital, I hope you enjoy. The address is http://tv-digital-brasil.blogspot.com. A hug.

Post a Comment