14

I have recently hosted in Amazon S3, and I need the log files to calculate the statistics for the "get", "put", "list" operations in the objects.

And I've observed that the log files are organized weirdly. I don't know when the log will appear(not immediatly, at least 20 minutes after the operation) and how many lines of logs will be contained in one log file.

After that, I need to download these log files and analyse them. But I can't figure out how often I will do this.

Can somebody help? Thanks.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Lulu
  • 173
  • 1
  • 8

2 Answers2

19

What you describe (log files being made available with delays and being in unpredictable order) is exactly what is declared by AWS as behaviour to expect. This is by nature of distributed system, AWS S3 is using to provide S3 service, the same request may be served each time from different server - I have seen 5 different IP addresses being provided for publishing.

So the only solution is: accept the delay, see the delay you experience and add some extra time and learn living with this total delay (I would expect something like 30 to 60 minutes, but statistics could tell more).

If you need log records ordered, you have either sort them yourself, or search for some log processing solutions - I have seen some applications being offered exactly for this purpose.

In case, you really need to get your log file with very short delay, you have to make the logs yourself and this means, you have to write and run some frontend, which gives access to your files on S3 and at the same time keeps logging as needed.

I run such a solution, users get user name and password and url of my frontend. As they send the request, I evaluate, if they provide proper credentials and if they are allowed to see given resource, and if so, I create few minutes valid temporary url for that resource and redirect the request to that.

But such a fronted costs money (you have to run your frontend somewhere) and is less robust, then accessing directly the AWS S3.

Good luck, Lulu.

Jan Vlcinsky
  • 42,725
  • 12
  • 101
  • 98
  • Thanks so much! I will try to download and analyze these logs for my statistics. Good day for you. – Lulu Oct 31 '13 at 09:13
  • 1
    is this documented anywhere, that the logging is delayed ? :) – Jigar Dec 02 '14 at 06:59
  • 1
    @Jigar see "Best Effort Server Log Delivery"(http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html). Quoting: Server access log records are delivered on a best effort basis. Most requests for a bucket that is properly configured for logging will result in a delivered log record, and most log records will be delivered within a few hours of the time that they were recorded. – Jan Vlcinsky Dec 02 '14 at 12:52
0

A lot has changed since the time that the question was originally posted. The delay is still there, but one of OP concerns was when to download the logs to analyze them.

One option right now would be to leverage Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/setup-event-notification-destination.html

This way, whenever an object is created in the access logs bucket, you can trigger a notification either to SNS, SQS or Lamba, and based on that download and analyze the log files.

Mateusz Mrozewski
  • 2,151
  • 1
  • 19
  • 28