I'm trying to extract specific files from Amazon S3 without having to read all the bytes because the archives can be huge and I only need 2 or 3 files out of it.
I'm using the AWS Java SDK. Here's the code (Exception Handing skipped):
AWSCredentials credentials = new BasicAWSCredentials("accessKey", "secretKey");
AWSCredentialsProvider credentialsProvider = new AWSStaticCredentialsProvider(credentials);
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion(Regions.US_EAST_1).withCredentials(credentialsProvider).build();
S3Object object = s3Client.getObject("bucketname", "file.tar.gz");
S3ObjectInputStream objectContent = object.getObjectContent();
TarArchiveInputStream tarInputStream = new TarArchiveInputStream(new GZIPInputStream(objectContent));
TarArchiveEntry currentEntry;
while((currentEntry = tarInputStream.getNextTarEntry()) != null) {
if(currentEntry.getName().equals("1/foo.bar") && currentEntry.isFile()) {
FileOutputStream entryOs = new FileOutputStream("foo.bar");
IOUtils.copy(tarInputStream, entryOs);
entryOs.close();
break;
}
}
objectContent.abort(); // Warning at this line
tarInputStream.close(); // warning at this line
When I use this method it gives a warning that not all the bytes from the stream were read which I'm doing intentionally.
WARNING: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Is it necessary to drain the stream and what would be the downsides of not doing it? Can I just ignore the warning?