1

I am trying to use S3 Select to query some data from a CSV file on S3 using the following query:

aws s3api select-object-content \
--bucket <bucket> \
--key <key> \
--expression "select `lineItem/intervalUsageStart` from s3object limit 100" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {}, "CompressionType": "NONE"}' \
--output-serialization '{"CSV": {}}' "output.csv"

However, this fails with:

An error occurred (ParseUnExpectedKeyword) when calling the SelectObjectContent operation: Unexpected keyword found, KEYWORD:from at line 1, column 9.

I believe this is because I am using back-ticks to escape the column I want to get data from. If I don’t escape the column name, it fails with the following:

An error occurred (LexerInvalidChar) when calling the SelectObjectContent operation: Invalid character at line 1, column 16.

I guess this is because of the / in the column name. Is there a way I can get data from this particular column in this file? Thanks in advance!

jobin
  • 2,600
  • 7
  • 32
  • 59

1 Answers1

0

I see S3 Select supports fetching columns by index as well (like _1, _2 for the first, second columns, etc.). This would help as well for now. Modifying the above query to:

aws s3api select-object-content \
--bucket <bucket> \
--key <key> \
--expression "select _2 from s3object limit 100" \
--expression-type 'SQL' \
--input-serialization '{"CSV": {}, "CompressionType": "NONE"}' \
--output-serialization '{"CSV": {}}' "output.csv"

(since lineItem/intervalUsageStart is the second column in the CSV) helped resolve the issue.

jobin
  • 2,600
  • 7
  • 32
  • 59
  • what if I want to use such a column in WHERE clause? I tried _10='xyz' and it didn't work i.e. threw the same error. – Asim Dec 29 '22 at 20:53