Amazon S3 Select Query : Column values get left shifted in CSV output when we query a non-existant key

Question

I have a JSON file in S3 of the format:

{
"A":"a",
"C":{"C1":"c1","C2":"c2"},
"E":"e"
}

And when I query like

select S3Object.A,S3Object.C1,S3Object.C2,S3Object.E from S3Object

I get below in CSV output:

A,C1,C2,E
a,e

I understand that the correct query is below:

select S3Object.A,S3Object.C.C1,S3Object.C.C2,S3Object.E from S3Object

and that will give me the output

A,C.C1,C.C2,E
a,c1,c2,e

But by querying a non-existant column, why does value e get shifted to show the value below C1 column instead of E column?

I could not find any documentation in Amazon AWS about the behavior if I query a non-existant column.

select S3Object.A,S3Object.D,S3Object.E from S3Object

The output being:

A,D,E
a,e

Is there any way to??

1. Hide D from column header when it doesn't exist in JSON source??

A,E
a,e

2. Or, leave the field as empty corresponding to the column header D??

A,D,E
a,,e

Try querying for one value at a time. Are you sure that `S3Object.C1` is valid? Should it actually be `S3Object.C.C1`? — John Rotenstein, Apr 01 '20 at 03:31
Hi @JohnRotenstein I think it shouldn't be valid, but it doesn't seem to be erring out. The output that I get is even more wierder where the column names appear as requested, but the column values are only available for two columns, and also it is shifted towards left, no matter what the column name order. So I get value e for C1. — Supriya, Apr 01 '20 at 06:04
I loaded your file into S3 and ran your query. The output appears to be JSON rather than CSV. How did you get the CSV-like output? — John Rotenstein, Apr 01 '20 at 06:44
I used the below imports `import com.amazonaws.services.s3.model.CSVInput;` `import com.amazonaws.services.s3.model.CSVOutput;` and `CSVOutput()` functionality — Supriya, Apr 01 '20 at 12:35
Ah! It seems that it the result of that library rather than the S3 Select service itself. Possibly because it is expecting a certain number of output columns, but missing fields are not processed. You'll probably need to process the JSON rather than the CSV. — John Rotenstein, Apr 01 '20 at 22:26

0 Answers0