2

I am using s3 select query along with where clause to retrieve data from s3. The query is working fine and returning the expected result when there's no where clause. Although when I am using where clause, the filtered data is correct, but the key in the object is the first row after the header and not the header.

Example : csv file

A B C

1 2 3

1 5 6

Query : select * from s3object s where s._1 = '1' limit 100

Expected Output : [{A : 1, B:2, C:3}, {A:1, B:5, C:6}]

Actual Output : [{1:1, 2:5, 3:6}]

This is the params object I am using to query :

let params = {
    Bucket: S3_BUCKET,
    Key: S3_PATH,
    Expression: "select * from s3object s where s._1 = '1' limit 100"
    ExpressionType: "SQL",
    InputSerialization: {
      CSV: {
        FileHeaderInfo: "NONE",
        RecordDelimiter: "\n",
        FieldDelimiter: ","
      }
    },
    OutputSerialization: {
      CSV: {}
    }
  };

I get the same output even when I use FileHeaderInfo : "USE", and change the query to select * from s3object s where id = '22' and s.date > '2020-05-01' limit 100

AWS Doc : https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html

user3807691
  • 1,284
  • 1
  • 11
  • 29

2 Answers2

2

So it seems, while fetching the query results from s3, it is impossible to get the headers as well. We can query with headerNames, or with columnNumber, but if we use the where clause, then we should use headerNames, and in that case, the header row doesn't come in the results.

So, I have now hardcoded the headers in my api call from where I am calling s3 select query, and appending those in the results.

user3807691
  • 1,284
  • 1
  • 11
  • 29
0

Change the params to the following should work.

let params = {
  Bucket: S3_BUCKET,
  Key: S3_PATH,
  ExpressionType: "SQL",
  Expression: "select * from s3object s where s.A = '1' limit 100"
  InputSerialization: {
    CSV: {
      FileHeaderInfo: "USE",
      RecordDelimiter: "\n",
      FieldDelimiter: ","
    }
  },
  OutputSerialization: {
    JSON: {}
  }
};
jellycsc
  • 10,904
  • 2
  • 15
  • 32
  • Sorry I forgot to add Expression in the question. Have updated it now – user3807691 May 21 '20 at 05:14
  • Although when I pass `FileHeaderInfo : "USE"`, then I get the following error : `Some headers in the query are missing from the file. Please check the file and try again.` – user3807691 May 21 '20 at 05:15