0

I wrote a method to read information from S3 object. There are multiple records in S3Object, what's the best way to read all the lines. Does it only read the first line of the object? How to make sure all the lines are read? Anyone can provide some suggestion?

while ((line = reader.readLine()) != null) {
            map = objectMapper.readValue(line, new TypeReference<Map<String, Object>>() {});

 public Map<String, Object> readS3ObjectData(@NonNull S3Object s3Object) throws IOException {
        S3ObjectInputStream s3InputStream = s3Object.getObjectContent();
        BufferedReader reader = new BufferedReader(new InputStreamReader(s3InputStream, StandardCharsets.UTF_8));
        String line = "";
        Map<String, Object> map = new HashMap<>();
        while ((line = reader.readLine()) != null) {
            map = objectMapper.readValue(line, new TypeReference<Map<String, Object>>() {});
            LOGGER.info("Create Object mapper successfully");
        }
        reader.close();
        s3InputStream.close();
        return map;
    }
reactnative
  • 505
  • 2
  • 7
  • 14

1 Answers1

2

I wrote a method to read information from S3 object.

It looks fine to me1.

There are multiple records in S3Object, what's the best way to read all the lines.

Your code should read all of the lines.

Does it only read the first line of the object?

No. It should read all of the lines2. That while loop reads until readLine() returns null, and that only happens when you reach the end of the stream.

How to make sure all the lines are read?

If you are getting fewer lines than you expect, EITHER the S3 object contains fewer lines than you think, OR something is causing the object stream to close prematurely.

For the former, count the lines as you read them and compare that with the expected line count.

The latter could possibly be due to a timeout when reading a very large file. See How to read file chunk by chunk from S3 using aws-java-sdk for some ideas on how to deal with that problem.


1 - Actually, it would be better if you used a try with resources to ensure that the S3 stream is always closed. But that won't cause you to "lose" lines.
2 - This assumes that the S3 service doesn't time out the connection, and that you are not requesting a part (chunk) or a range in the URI request parameters; see https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html .

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216