analyzing our S3 access log files I have noticed that the value of the »data transfer out per month« in the S3 access log files (S3stat and own log file analysis) is strongly different from the values in your bills.
Now I have made a test downloading files from one of our buckets and it looks like the access log files are incorrect.
At the 03/02/2015 I have uploaded a zip file on our bucket and then downloaded the complete file successfully with two different internet connections. One day later at the 04/02/2015 I have analyzed the log files. Unfortunately, both entries have the value "-" at "Bytes Sent". Amazons »Server Access Log Format« (http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html) says: »The number of response bytes sent, excluding HTTP protocol overhead, or "-" if zero.«
The corresponding entries looks like this:
Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 41 +0000] RemoteIP - RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP / 1.1 "200 - - 760 542 2228865159 58" - "" Mozilla / 5.0 (Windows NT 6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-
Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 57 +0000] RemoteIP - RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP / 1.1 "200 - - 860 028 2228865159 23" - "" Mozilla / 5.0 (Windows NT 6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-
As you can see has both logs quite long connection duration »Total Time«: 0:12:40 and 0:14:20.
Then I checked our log files of our main buckets for the month December 2014 based on these findings. In 2332 relevant entries (all ZIP files on our bucket) I found 860 entries with this error.
Thus, the Amazon S3 access log files seem flawed and useless for our analysis.
Can anybody help me? Do I make a mistake and if so, how can these log files be reliably evaluated?
Thanks Peter