Trying to figure out if it's possible to download a specific file, or a range of bytes, from an uncompressed TAR archive in S3.
The use case can be described like this:
- The TAR file is generated by my application (so we have control of that)
- The TAR file lives in an S3 bucket
- The TAR file is named archive.tar
- The TAR file contains two files: metadata.txt and payload.png
- metadata.txt is guaranteed to always be of size "n" bytes, where "n" is relatively small
- payload.png can be any size and thus can be a very large file (> 1 GB)
- My application needs to be able to download metadata.txt to understand how to process the TAR file, and I don't want the application to have to download the whole TAR file just for the metadata.txt file
Ideally, at any given point, I should only ever have the metadata.txt file opened in memory and never the entire TAR archive or any part of payload.png. I don't want to incur the network or memory overhead of downloading a huge TAR archive just to be able to read the small metadata.txt file contained.
I've noticed S3ObjectInputStream in the AWS SDK, but I'm not sure how to use it with a TAR file for my use case.
Anyone ever implement something similar or have any pointers to references I can check out to help with this?