Questions tagged [amazon-s3-select]

Amazon S3 Select enables applications to retrieve only a subset of data from an Amazon S3 object by using simple SQL expressions.

See: SQL Functions - Amazon Simple Storage Service

91 questions
2
votes
1 answer

Use case of Amazon S3 Select

I took a look at the link and trying to understand what s3 select is. Most applications have to retrieve the entire object and then filter out only the required data for further analysis. S3 Select enables applications to offload the heavy lifting…
Isaac
  • 12,042
  • 16
  • 52
  • 116
2
votes
2 answers

Query header in s3 select nodejs

I am using s3 select query along with where clause to retrieve data from s3. The query is working fine and returning the expected result when there's no where clause. Although when I am using where clause, the filtered data is correct, but the key…
user3807691
  • 1,284
  • 1
  • 11
  • 29
2
votes
1 answer

Is it possible to execute nested queries in S3 Select?

I am trying to execute a query of the type: SELECT * FROM (SELECT * FROM s3object s WHERE ..) But I get the following error: Invalid Data Source type. So does S3 Select support nested queries or am I missing on something?
2
votes
1 answer

AWS S3 Select - Retrieve data from 2 different levels of a json

I have this json stored in a S3 file (which is actually the output of a aws Comprehend EntitiesDetection job => meaning I have unfortunately no control of how this json is organized, it is uploaded to S3 by AWS Job itself, so I can't modify the…
Mathieu
  • 4,587
  • 11
  • 57
  • 112
2
votes
1 answer

SQL expression to fetch count of an object in a json file using AWS S3-select

I have a json file in S3 with the following strucure { status: "Success", created_at: "19 AUG 2019", employees:[ {"name":"name1", "id":"1"}, {"name":"name2", "id":"2"}, {"name":"name3", "id":"3"} ], contacts: [] } The…
Achaius
  • 5,904
  • 21
  • 65
  • 122
2
votes
0 answers

s3 select : How to get column names of parquet files?

I am using s3 select to read first 10 rows of a large parquet file stored in S3 bucket. I am able to get the first 10 rows in csv format but it comes without any header. It contains only rows without any column names. Is there any way to get headers…
CodeHunter
  • 2,017
  • 2
  • 21
  • 47
2
votes
1 answer

How can I get the count of sub-object in a s3 json file using s3-select?

I am storing my json files in aws-s3 using Ruby-on-Rails. The object looks like, { status: "Success", created_at: "19 Jan 2019", employees:[ {"name":"name1", "id":"1"}, {"name":"name2", "id":"2"}, {"name":"name3", "id":"3"} …
Beu
  • 1,370
  • 10
  • 23
2
votes
0 answers

Can I generate a presigned s3 url that incorporates an s3-select expression?

I'm storing large datasets in s3 and want to create presigned urls to hand out to clients who want to download selected columns from a dataset. The (java) sdk does not seem to offer a pre-packaged way to do this. Has Amazon made any explicit…
2
votes
1 answer

How to delete a file from amazon S3 bucket using cURL

I was trying to delete a file from s3 bucket which is hosted in my client's in-house storage s3.fidapp.org. I used below command but it didn't work. I'm getting below error. SignatureDoesNotMatchThe request signature we…
2
votes
2 answers

botocore.excceptions.ClientError: An error occurred (InvalidTextEncoding) when calling the SelectObjectContent operation

while executing below code through python response= S3.select_object_content(Bucket=S3_bucket_name,Key=S3_file_Key,ExpressionType='SQL', Expression="select count(*) from s3object", InputSerialization={'CSV': {"FileHeaderInfo":…
Monika Bhajipale
  • 503
  • 1
  • 7
  • 9
2
votes
0 answers

Strange results from using AWS S3 SELECT to get CSV data into SQL table

I have written an AWS State Machine in C# to load data from a CSV file from an S3 Bucket, into a SQL Server database table but I'm getting really odd data into the table. The two main functions are as follows, the first gets the response payload,…
JamesMatson
  • 2,522
  • 2
  • 37
  • 86
2
votes
1 answer

Encoding Error Using AWS S3 Select with the AWS SDK for Ruby

I am trying to do the following: download the output of an Athena query from S3 (file.csv) gzip the output and upload to a different S3 location (file.csv.gz) use S3 Select from within the Ruby SDK to query the contents of file.csv.gz I always get…
Sam in Oakland
  • 113
  • 1
  • 7
2
votes
1 answer

Receive circular reference error from AWS S3 Select query using s3api to count lines in a file in S3

I'm trying to count the number of lines in a file stored in an S3 bucket using AWS SELECT. Specifically, executing the following command (based upon AWS s3api documentation and this Java example for the count(*) query): aws s3api…
1
vote
1 answer

S3 Select Query JSON for nested value when keys are dynamic

I have a JSON object in S3 which follows this structure: : { : } For example, { "code_abc": { "client_1": 1, "client_2": 10 }, "code_def": { "client_2": 40, …
nvergos
  • 432
  • 3
  • 15
1
vote
1 answer

How can I write Parquet files with int64 timestamps (instead of int96) from AWS Kinesis Firehose?

Why do int96 timestamps not work for me? I want to read the Parquet files with S3 Select. S3 Select does not support timestamps saved as int96 according to the documentation. Also, storing timestamps in parquet as int96 is deprecated. What did I…
Faber
  • 1,504
  • 2
  • 13
  • 21