9

I have looked at both AWS S3 Java SDK - Download file help and Working with Zip and GZip files in Java.

While they provide ways to download and deal with files from S3 and GZipped files respectively, these do not help in dealing with a GZipped file located in S3. How would I do this?

Currently I have:

try {
    AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
    BufferedReader fileIn = new BufferedReader(new InputStreamReader(
            fileObj.getObjectContent()));
    String fileContent = "";
    String line = fileIn.readLine();
    while (line != null){
        fileContent += line + "\n";
        line = fileIn.readLine();
    }
    fileObj.close();
    return fileContent;
} catch (IOException e) {
    e.printStackTrace();
    return "ERROR IOEXCEPTION";
}

Clearly, I am not handling the compressed nature of the file, and my output is:

����sU�3204�50�5010�20�24��L,(���O�V�M-.NLOU�R�U�����<s��<#�^�.wߐX�%w���������}C=�%�J3��.�����둚�S�ᜑ���ZQ�T�e��#sr�cdN#瘐:&�
S�BǔJ����P�<��

However, I cannot implement the example in the second question given above because the file is not located locally, it requires downloading from S3.

What should I do?

Community
  • 1
  • 1
ylun.ca
  • 2,504
  • 7
  • 26
  • 47
  • Why can't you ungzip it and then read it to a file? – jstnchng Jul 01 '15 at 18:16
  • Because `fileObj` is an `S3Object`, so I cannot use the method described in [this](http://stackoverflow.com/questions/3711282/working-with-zip-and-gzip-files-in-java) – ylun.ca Jul 01 '15 at 18:19
  • Have you tried getting the S3Object, wrapping it in an input data stream, wrapping that in a Gzip stream, and then writing it out to a file? – jstnchng Jul 01 '15 at 20:19
  • Didn't work, but I did end up solving issue, will answer qn shortly @jstnchng – ylun.ca Jul 01 '15 at 23:21

5 Answers5

9

I solved the issue using a Scanner instead of an InputStream.

The scanner takes the GZIPInputStream and reads the unzipped file line by line:

fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
ylun.ca
  • 2,504
  • 7
  • 26
  • 47
  • Nice. I had the same issue before that I fixed a week or so ago, I put it into an gzipinput stream and read from it with a buffered reader. – jstnchng Jul 02 '15 at 13:45
  • 1
    Really, that worked for you? `Buffered Reader` should only take in a `Reader` as a parameter, so that should result in a compilation error as `GZIPInputStream` is an `InputStream`, not a `Reader`. @jstnchng – ylun.ca Jul 02 '15 at 19:56
6

You have to use GZIPInputStream to read GZIP file

       AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
            .withCredentials(new ProfileCredentialsProvider())
            .build();
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

Please try this way to download GZip file from S3.

John Stark
  • 1,293
  • 1
  • 10
  • 22
Ahmad Al-Kurdi
  • 2,248
  • 3
  • 23
  • 39
2

Try this

    BasicAWSCredentials creds = new BasicAWSCredentials("accessKey", "secretKey");
    AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(creds))
            .withRegion(Regions).build();
    String bucketName = "bucketName";
    String keyName = "keyName";
    S3Object fileObj = s3.getObject(new GetObjectRequest(bucketName, keyName));
    Scanner fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
    if (null != fileIn) {
        while (fileIn.hasNext()) {
            System.out.println("Line: " + fileIn.nextLine());
        }
    }
}
1

I was working on achieving the same using the same using SDK 2.x. With new philosophy introduced in SDK 2, I had to do a bit of research before arriving at the solution. So, adding a code snippet here for the benefit of people using SDK 2.0.

    S3Client s3 = S3Client.builder()
            .region(region)
            .build();

    //Using the key, get the object
    GetObjectRequest request = GetObjectRequest.builder().bucket(bucketName).key(key).build();
    //Read the object as input stream
    InputStream inputStream = s3.getObject(request, ResponseTransformer.toBytes()).asInputStream();
    final GZIPInputStream zipInputStream;
    try {
        //Convert it to GZIP stream
        zipInputStream = new GZIPInputStream(inputStream);;
        BufferedReader in = new BufferedReader(new InputStreamReader(zipInputStream));
        String contentStr;
        while ((contentStr = in.readLine()) != null) {
            //Process the contents
            System.out.println(contentStr);
        }
    } catch (IOException e) {
        //Handle the exception
    }
Mugdha
  • 171
  • 2
  • 10
-1

I wasn't quite looking for this issue but I did feel like improving the quality of this thread by actually explaining why the already provided solution works.

No it's not because of the Scanner as is suggested. It's because the stream is being ungzipped by wrapping fileObj.getObjectContent() in a GZIPInputStream which unzips the contents.

Remove the scanner but keep the GZIPInputStream and things will still work.

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126