I am trying to make a select request from data stored in CSV format with AWS S3 SelectObjectContent
and an SQL expression. I am working with Laravel 6.
When I request all the data from the object ('Expression' => 'SELECT * from S3Object'
), everything works fine. I retrieve the data and can work with it.
The problem occurs when I want to use a WHERE
-clause in the SQL expression.
This is my code:
$client = new S3Client([
'region' => 'us-east-2',
'version' => 'latest',
]);
$results = $client->selectObjectContent([
'Bucket' => 'mybucketname',
'Key' => 'my_data_file.csv',
'ExpressionType' => 'SQL',
'Expression' => 'SELECT * FROM S3Object s WHERE s.continent = "Europe"',
'InputSerialization' => [
'CSV' => [
'FileHeaderInfo' => 'USE',
'RecordDelimiter' => "\n",
'FieldDelimiter' => ',',
],
],
'OutputSerialization' => [
'CSV' => [
'QuoteFields' => 'ASNEEDED',
'RecordDelimiter' => ",",
],
],
]);
I get the following error:
Error executing "SelectObjectContent"
<Error><Code>MissingHeaders</Code><Message>Some headers in the query are missing (truncated...) MissingHeaders (client): Some headers in the query are missing from the file. Please check the file and try again.
When I check the file (downloading it directly from the S3 bucket), in the first row I find the headers comma-separated, as I expected: continent,country,user_name,created_at
I also tried using positional headers and changed the expression to 'Expression' => 'SELECT * FROM S3Object s WHERE s._1 = "Europe"'
and 'FileHeaderInfo' => 'IGNORE'
, but that gave me another error:
Error executing "SelectObjectContent"
<Error><Code>InvalidColumnIndex</Code><Message>The column index at line 1, column (truncated...) InvalidColumnIndex (client): The column index at line 1, column 39 is invalid. Please check the service documentation and try again.
I am clueless how to proceed. I already saw this post, but it didn't help. Maybe someone has a clue? Feel free to tell me what further information you could need for helping me solve this issue!
Thanks in advance!
EDIT
I tried querying the csv file via the AWS console and everything works fine: The SQL expression in which I use the file header info as well as the one where I use the positional headers instead. Transferring the SQL expression in my controller I get the same errors I mentioned above:
Missing Headers for the query 'SELECT * FROM S3Object WHERE continent = "Europe"'
(and 'FileHeaderInfo' => 'USE'
)
Invalid Column Index for the query 'SELECT * FROM S3Object s WHERE s_1 = "Europe"'
(and 'FileHeaderInfo' => 'NONE'
)