4

I have 2 servers using the same data located in AWS S3 storage. One is processing HTTP traffic, the other does longer lived jobs. How it works is that HTTP one saves data to S3, dispatches event to backend, and the data is processed.

However, it seems that every couple of minutes there's an error in the backend because the data on S3 does not exist yet. Is there a delay between uploading files to S3 and having them available to other client connections?

I'm sure the upload has finished by the time backend access those files, and the HTTP server uses keep-alive connection to S3.

skrat
  • 505
  • 1
  • 7
  • 14

2 Answers2

3

While the first answer suggesting a blind timeout will likely work, might I suggest looking into S3 notification events. If you plan to scale out to more than two servers, using something like SNS or SQS to manage what happen once an object has been successfully written, for example, is going to make things much easier than relying on blanket timeouts.

You can read up on this functionality here: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Bill B
  • 39
  • 1
2

Due to how S3 distributes data once uploaded and how they load-balance their front end requests, it would be quite conceivable that an object might take a couple of seconds to fully propogate around their infrastructure. I would recommend just putting a delay of a few seconds in your logic (between upload and retrieve) or possibly just detect a failure then wait a few seconds and retry.

EEAA
  • 109,363
  • 18
  • 175
  • 245