0

I'm working on a desktop Java application. It needs to check for a specific file on my S3 server.

I don't want to download the entire file to compare, I need to find out if the one in the server is newer then the local one and then download and replace.

I'm not sure how to do the check if newer available part of this.

I've heard of hashing as a method but I have little experience with how to actually do that on both fronts (locally and via S3)

Sammy Guergachi
  • 1,986
  • 4
  • 26
  • 52

3 Answers3

1

To get the hash of the remote file: How to get the md5sum of a file on Amazon's S3

To get the hash of the local file: Getting a File's MD5 Checksum in Java

Community
  • 1
  • 1
dnault
  • 8,340
  • 1
  • 34
  • 53
0

If you are also the one originally creating the file on S3, you can store custom ObjectMetadata with an MD5 (e.g. meta.setUserMetadata(mymap)) when you first putObject(), and then look this up with s3.getObjectMetadata().

DK_
  • 2,648
  • 2
  • 21
  • 20
0

Compare E-Tag programmatically for file with size < 5 GB.

Compute hash for local file:

String hash = DigestUtils.md5Hex(new FileInputStream(path));

Get Etag of S3 Object:Get Etag of S3 Object Already mentioned by @dnault

If you compute hash as explained above, then it should be same for all the cases when file size is less than 5 GB.

If file size is greater than 5 GB: Multi-part MD5

Community
  • 1
  • 1
Harshit
  • 692
  • 10
  • 17
  • It also depends on SSE type https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html – anishtain4 Oct 17 '21 at 19:41
  • not getting a matching value with DigestUtils.md5Hex() looks like they are doing something different, possibly hashing chunks of the file and then hashing the cat of the hashes + number of chunks? I'm looking at a 63MB file etag: 98718d5bab00b4dad69ae30305b5b665-8 - DigestUtils.md5Hex() for same file: be79229e58c796b4a3b166e74e103808 – Jim Ford Sep 01 '22 at 18:08
  • ok checking back it looks like there are a couple of possibilities for my use case: a. bucket is encrypted and will not have a predictable etag based on local file of same content and b. it is likely that my file was uploaded with multipart upload which would be the same as mentioned above. Either way I was not able to reproduce the etag from local file. – Jim Ford Sep 01 '22 at 18:18