0

analyzing our S3 access log files I have noticed that the value of the »data transfer out per month« in the S3 access log files (S3stat and own log file analysis) is strongly different from the values in your bills.

Now I have made a test downloading files from one of our buckets and it looks like the access log files are incorrect.

At the 03/02/2015 I have uploaded a zip file on our bucket and then downloaded the complete file successfully with two different internet connections. One day later at the 04/02/2015 I have analyzed the log files. Unfortunately, both entries have the value "-" at "Bytes Sent". Amazons »Server Access Log Format« (http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html) says: »The number of response bytes sent, excluding HTTP protocol overhead, or "-" if zero.«

The corresponding entries looks like this:

Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 41 +0000] RemoteIP - RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP / 1.1 "200 - - 760 542 2228865159 58" - "" Mozilla / 5.0 (Windows NT 6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-

Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 57 +0000] RemoteIP - RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP / 1.1 "200 - - 860 028 2228865159 23" - "" Mozilla / 5.0 (Windows NT 6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-

As you can see has both logs quite long connection duration »Total Time«: 0:12:40 and 0:14:20.

Then I checked our log files of our main buckets for the month December 2014 based on these findings. In 2332 relevant entries (all ZIP files on our bucket) I found 860 entries with this error.

Thus, the Amazon S3 access log files seem flawed and useless for our analysis.

Can anybody help me? Do I make a mistake and if so, how can these log files be reliably evaluated?

Thanks Peter

  • Are the log entries above taken *directly* from the S3 log? The formatting looks like it has been changed, perhaps by a pre-processing job with an error in its pattern matching. The fields don't line up exactly as documented, the quotes don't seem properly aligned, and the leading zero in the number "028" also seems unusual. – Michael - sqlbot Feb 11 '15 at 13:06
  • Yes, the log entries are taken direkty from the S3 log ... but from paranoid reasons I have adapted the entries and added a few mistakes. – Peter Klarl Feb 13 '15 at 10:25
  • 860 028 should be 860028 – Peter Klarl Feb 13 '15 at 10:35
  • Do you have an idea what the problem cloud be in this case? – Peter Klarl Feb 17 '15 at 14:02
  • After two months of inquiry Amazon confirmed that they have identified and fixed the issue. They are now planning accordingly to have this fix included in their next update to the S3 service. Unfortunately they probably will not be able to fix the old logs, however the support staff have asked the S3-team for clarification on this. When I know more, I'll post the solution and close this task. – Peter Klarl Mar 09 '15 at 10:52

1 Answers1

0

after two months of inquiry Amazon it looks like Amazon has fixed this issue. My first test for the time period 13.03. to 16.03. has no such errors anymore and our S3stat analysis has a massive (now correct) leap in the »Daily Bandwidth« since the 12.03.2015.

For more information you can look here: https://forums.aws.amazon.com/thread.jspa?messageID=606654

Peter