1

I need to calculate the number of video and audio file downloads from our media server. Our media server only hosts audio/video files (mp3 and mp4) and we parse our IIS log files monthly using Stone Steps Webalizer.

When I look at the Webalizer stats most of the ‘hits’ are ‘code 206 partial content’ and most of the remainder are ‘code 200 ok’. So for instance our most recent monthly Webalizer stats look something like this -

Total hits: 1,600,000 Code 200 - ok: 300,000 Code 206 - Partial Content: 1,300,000

The total hits figure is much larger than I would expect it to be in relation to the amount of data being served (Total Kbytes).

When I analyse the log files it looks as though media players (iTunes, Quicktime etc) create multiple 206's for a single download/play and I suspect that Webalizer does not group these multiple 206's from the same IP/visit and instead records each 206 as a ‘hit’ - and because of this the total hits figure is vastly inflated. There is a criticism of Weblizer on the Wiki page which appears to confirm this - http://en.wikipedia.org/wiki/Webalizer

Am I correct about the 206's and Webalizer, and if I am correct how would I calculate the number of downloads? Is there an industry standard methodology and/or are there alternative web analytics applications that would be better suited to the task?

Any help or advice would be much appreciated.

3 Answers3

4

Didn't receive any response to my question but thought I would give an update.

We have analysed a one hour sample of our log files and we have done some testing of different browsers / media players on an mp3 and mp4 file.

Here are our findings -

  • Some media players, particularly iTunes/Quicktime, produce a series of 206 requests but do not produce a 200 request.

  • Most but not all web broswers (Chrome is the exception), produce a
    200 request and no 206 requests when downloading a media file i.e.
    download to desktop as opposed to playing in a desktop media player
    or media player plug-in

  • If the file is cached by the browser/media player it may produce 304 request and no 200 and no 206 request.

Given the above we think it's impossible to count 'downloads' of media files from log file analysis unless the software has an intelligent algorithm designed specifically for that purpose. For example, it would need to group all requests for a specific media file from the same IP within a set time period (say 30 minutes) and count that as one download. As far as I'm aware there isn't any log file analysis software on the market which can offer that functionality.

I did a quick Google search to find out more about podcast/video metrics / log file analysis and it does seem to be a very real, albeit niche problem. Google Analytics and other web metrics tools that use web beacons e.g. SiteStat, are not an option unless your media files are only available for download from your website i.e. no RSS or iTunes syndication etc. Even then I'm not sure if they could do the job.

I think this is why companies such as podtrac and blubrry offer specialised podcast/video measurement tools using redirects as opposed to log file analysis.

Podtrac http://podtrac.com/publisher/measurement

Blubrry http://www.blubrry.com/podcast_statistics/

If anyone has experience or expertise in this area feel free to chime in and offer advice or correct me if I'm wrong.

0

Try my software. I encountered the same issue with mp3's being split into multiple streams for IPods and Iphones. It is really easy to implement and works a treat.

Github

0

This is probably WAY too late to help you specifically but if you have parsed your server logs and stored them somewhere sensible like a DBMS a quick bit of SQL will give you the combined results you're after. Given a very simple log table where each 206 is recorded with a 'hit time' the ip address of the endpoint and an id/foreign key of the item fetched you could run this query:

select min(hit_time) as hit_time, ip_address, episode_id
from podcast_hit
group by DATE(hit_time), ip_address, episode_id

This will group up all the 206 records and make them unique by day and user giving you more accurate stats. Hope this helps someone!

ojhilt
  • 1
  • 1