How to fix "MissingHeaders" Error while appending where clause with s3 select

Question

I have a csv file in the format

IDATE_TIMESTAMP,OPEN,HIGH,LOW,CLOSE,VOLUME
1535535060,94.36,94.36,94.36,94.36,1
1535535120,94.36,94.36,93.8,93.8,1
1535535180,93.8,93.8,93.8,93.8,0
1535535240,93.8,93.8,93.74,93.74,1
1535535300,93.74,93.74,93.74,93.74,0
1535535360,93.74,93.74,93.74,93.74,0
1535535420,93.74,93.74,93.74,93.74,0
1535535480,93.74,93.74,93.74,93.74,0
1535535540,93.74,93.74,93.74,93.74,0
.
.
.
.

I have to and from timestamp which will filter out the data from the file and return the output. I am using python + boto3 for s3 select.

fromTs = "1535535480"
toTs = "1535535480"
query = """SELECT * FROM s3object s WHERE s."IDATE_TIMESTAMP" >= "%s" AND s."IDATE_TIMESTAMP" <= "%s" """%(fromTs, toTs)
request = client.select_object_content(
        Bucket=bucket,
        Key=filename,
        ExpressionType="SQL",
        Expression=query,
        InputSerialization={"CSV":{"FileHeaderInfo":"Use", "FieldDelimiter":",", "RecordDelimiter":"\n"}},
        OutputSerialization={"CSV":{}},
    )

botocore.exceptions.ClientError: An error occurred (MissingHeaders) when calling the SelectObjectContent operation: Some headers in the query are missing from the file. Please check the file and try again.

This is error i am getting

score 0 · Answer 1 · answered Jan 22 '20 at 20:45

I know this is a bit late and might not be the solution to your issue, but I was having a similar one.

My issue turned out to be that I was attempting to perform an S3 Select on an object with UTF-8-BOM encoding, rather than just UTF-8. It turns out that the 3 byte BOM header was being interpreted as part of the first field in the CSV object, essentially corrupting the first column name.

As a result, rather than "IDATE_TIMESTAMP", the first column would be seen by the S3 Select call as "xxxIDATE_TIMESTAMP", causing an error when your expected column is "missing".

score 0 · Answer 2 · edited Aug 22 '21 at 15:02

0

The timestamp columns need to be casted as int. The following query:

fromTs = 1535535480
toTs = 1535535480

SELECT * FROM s3object s 
where cast(s.IDATE_TIMESTAMP as int) >= {} 
AND cast(s.IDATE_TIMESTAMP as int) <= {}".format(fromTs, toTs)

would work in python 3.

edited Aug 22 '21 at 15:02

Mohnish

1,010
1
12
20

answered Aug 20 '21 at 20:06

arp5

169
10

How to fix "MissingHeaders" Error while appending where clause with s3 select

2 Answers2

Linked