Putting file to S3 right after it's created

Question

I have two machines with different Java applications that both run on Linux and use a common Windows share folder. One app is triggering another to generate a specific file (e.g. image/pdf). Then the first app tries to upload the generated file to S3. The problem is I sometimes get this:

com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received.

OR this:

com.amazonaws.AmazonClientException: Data read has a different length than the expected: dataLength=247898; expectedLength=262062; includeSkipped=false; in.getClass()=class com.amazonaws.internal.ResettableInputStream; markedSupported=true; marked=0; resetSinceLastMarked=false; markCount=1; resetCount=0

All the processes are happening synchronously, one after another (i have also checked the logs which show no concurrent activity). Also I am not setting the md5 hash or the content length by myself, aws-sdk handles it by itself.

So my guess is that the generating application has written a file and returned but in fact it is still being written by the OS in background and that is why the first app is getting an incomplete file.

I would really appreciate suggestions on how to handle such situations. Maybe there is a way to detect if the file is not currently being modified by the OS?

*Maybe there is a way to detect if the file is not currently being modified by the OS?* No, there's no such reliable way - no matter what you may be told. Because the file can always be modified after you check. *So my guess is that the generating application has written a file and returned* Maybe, maybe not. Is the Java code actively forcing however it's writing the data to finish, flush to disk, and close the file? Or is it just letting whatever the object is go out of scope and wait for it to be garbage collected with the implied flush to disk and close of the file? — Andrew Henle, Apr 26 '17 at 10:32
@AndrewHenle well, i am pretty sure that all the streams are being closed properly... — Nestor Sokil, Apr 26 '17 at 13:02

david · Answer 1 · 2017-11-07T06:02:49.113

I was experiencing AmazonS3Exception: The Content-MD5 you specified did not match what we received. I finally solved it by addressing the first item on the list below, not terribly obvious.

Possible Solutions For Anyone Else:

Make sure not to use the same ObjectMetadata object across multiple putObject calls.
Consider disabling ChunkedEncoding. client.setS3ClientOptions(S3ClientOptions.builder().disableChunkedEncoding().build())
Make sure the file isn't being edited while it's being uploaded.

Putting file to S3 right after it's created

1 Answers1