Friday 6 April 2007

Wikimedia traffic patterns

Daniel Tobias posted to the English Wikipedia mailing list recently about Alexa's traffic statistics for Wikipedia, suggesting, among other things, that page views seem to peak on a Sunday. However, it has often been noted that there are problems with using Alexa's data in certain ways (they only sample people with the Alexa toolbar, for starters) and so I much prefer to look at our own statistics, the request graphs and traffic graphs hosted on the toolserver by Leon Weber (although it seems the script was written by someone else).

Let's take a look at a weekly graph (you can see the current weekly graph here):



All of these graphs are based on UTC. The black vertical line represents the beginning of a new week, which starts on a Monday. As you can see, the lowest days across this sample period are Saturday and Sunday, with the highest being on Monday to Thursday. It seems safe to infer that more people use Wikimedia projects during the working week than they do on the weekend.

Now let's look at a daily graph (you can see the current daily graph here):



The black vertical lines on this graph represent midnight UTC. The daily peak across all clusters occurs around 14:00 to 21:00 UTC, with a fairly steep decline on either side of that time period.

But that's the overall data. Things start to get interesting when you break it down into clusters.

Looking at the pmtpa and images clusters (the blue part of the graph), there's a fairly sustained high level from around 14:00 UTC to around 04:00 UTC the next day. It's a little hard to tell with the stacking graph, but the knams and knams-img clusters (green) both seem to have a sustained high level from around 08:00 to 22:00 UTC. Finally, the yaseo and yaseo-img clusters (yellow) seem to get the most traffic between 02:00 and 16:00 UTC.

Why the different times? Well, the pmtpa and images clusters are in Tampa, Florida (the Power Medium datacenter) and so the peak there, from 14:00 to 04:00 UTC, corresponds to the period from 10 am to midnight on the US East coast, and 7 am to 9 pm on the US West coast. So the peak for the Tampa cluster essentially corresponds to waking hours in the US. The knams and knams-img clusters are in Amsterdam, the Netherlands (hosted by Kennisnet) and the local peak there corresponds to waking hours across Europe. Finally, yaseo and yaseo-img are in Seoul, South Korea (hosted by Yahoo!) and their local peak, not surprisingly, corresponds to waking hours in East Asia.

Another interesting observation is that for the Tampa clusters there is a clear drop in requests about two-thirds of the way through the high period... which more or less corresponds to tea time. The latter part of the peak (the local evening) is still high, but not as high as during daylight.

So to tie this all together, most people use Wikimedia projects between 8 am and 10 pm local time, no matter where they are in the world. People also use Wikimedia much more during the working week than they do on weekends. Finally, people use the projects less when they are eating dinner, and are less likely to return to browsing on a full stomach.

Now there's some food for thought.

0 comments:

Post a Comment