Questions tagged [amazon-s3-select]

Amazon S3 Select enables applications to retrieve only a subset of data from an Amazon S3 object by using simple SQL expressions.

See: SQL Functions - Amazon Simple Storage Service

91 questions
0
votes
1 answer

Empty Column not being listed in S3 select in databricks

I'm querying a JSON file in S3 with multiple columns: SELECT a, b, c FROM json.`s3://my-bucket/file.json.gz` And the file looks like this: {a: {}, b: 0, c: 1} {a: {}, b: 1, c: 2} {a: {}, b: 2, c: 3} The query above fails and…
tty
  • 95
  • 1
  • 5
0
votes
0 answers

Amazon S3 Select: getting truncated data when querying large JSON object

I have a big json document stored in S3 with a structure like this: { "result": { "id": "123", "commits": ["comm1", "comm2", ..., "commN"] } } There are other fields there too and the number of commits can go into thousands. When I…
Juraj Martinka
  • 3,991
  • 2
  • 23
  • 25
0
votes
0 answers

S3 select aws console

I have the following data in a file: {"new_date":"2022-06-09","code":34,"value":33,"id":18} {"new_date":"2022-06-09","code":34,"value":36,"id":19} {"new_date":"2022-06-09","code":34,"value":35,"id":15} In AWS console im trying to execute the…
user1555190
  • 2,803
  • 8
  • 47
  • 80
0
votes
2 answers

botocore.exceptions.ClientError: An error occurred (ParseSelectMissingFrom) when calling the SelectObjectContent operation: Missing FROM after SELECT

Goal: Use S3 Select to extract columns from a .parquet on S3. I've tried various queries. Including the key in the query makes no difference. Code: s3 = boto3.client('s3') s3_uri = 's3://my-bucket/my-folder/' bucket, prefix =…
DanielBell99
  • 896
  • 5
  • 25
  • 57
0
votes
2 answers

Getting partial json response for s3select with aws java sdk v2

I am trying to implement s3select in a spring boot app to query parquet file in s3 bucket, I am only getting partial result from the s3select output, Please help to identify the issue, i have used aws java sdk v2. Upon checking the json…
0
votes
1 answer

How can we search the json file based on date field using S3 select

I am trying to query data on JSON file using S3-Select. I am unable to filter based on the date field. I have tried using current_date, sysdate and a few CAST options too. I am planning to compute the curent_date and send it from API. Before that, I…
Ashwin Kumar
  • 101
  • 1
  • 1
  • 9
0
votes
1 answer

Getting OverMaxRecordSize when querying through S3 select

Getting the following error The character number in one record is more than our max threshold, maxCharsPerRecord: 1,048,576 while running any query and trying to fetch any record. I've tried changing from JSON schema to CSV but that hasn't worked.…
fvnbab
  • 1
  • 1
0
votes
1 answer

AWS SDK2 java s3 select example - how to get result bytes

I am trying to use aws sdk2 java for s3 select operations but not able to get extract the final data. Looking for an example if someone has implemented it. I got some idea from [this post][1] but not able to figure out how to get and read the full…
0
votes
0 answers

AWS S3 Select CSV WHERE filtering not working on last column

I have a csv file in my s3 that looks like this name,status,age,loc aaa,aaa,1,zz bbb,bbb,2,yy ccc,,3,pp ddd,ddd,4,aaa SELECT * FROM s3object s WHERE name ='aaa' This query returns first row correctly. SELECT * FROM s3object s WHERE loc ='aaa' This…
lclankyo
  • 221
  • 3
  • 10
0
votes
1 answer

how to read json.snappy file from athena

I have input file in s3 bucket with .json.snappy compression and I am trying to read through athena table. I tried using different serde 'org.apache.hive.hcatalog.data.JsonSerDe' & 'org.openx.data.jsonserde.JsonSerDe' but it didn't work, Athena…
PB22
  • 31
  • 4
0
votes
0 answers

using s3 select I need to query JSON file. need some examples code snippets

using s3 select I need to query JSON file. need some examples code snippets using boto3 Thanks in advance sundar
Sundar
  • 95
  • 1
  • 13
0
votes
0 answers

S3 Select (python) does not return the header when using WHERE clause despite FileHeaderInfo=NONE

When I submit this query: SELECT * FROM s3object with the FileHeaderInfo of the input serialization set to NONE, I get the records expected with their header. As soon as I add a where clause like this: SELECT * FROM s3object WHERE _3 =…
Jeff Saremi
  • 2,674
  • 3
  • 33
  • 57
0
votes
1 answer

ClientError: An error occurred (InvalidTextEncoding) when calling the SelectObjectContent operation: UTF-8 encoding is required. reading gzip file

I am getting the above error in my code. encoding=latin-1 needs to be included as a parameter somewhere in select-object-content. Since I am new to this, I am not sure, where to add it. Can anyone help me in this? Code: client =…
0
votes
1 answer

JS : Getting datatype from CSV using Amazon S3 Select

I am trying to read a CSV from amazon S3 bucket (this could be any CSV so I do not have the header/datatype info ahead of read. I am able to get the header info using : const params = { Bucket: 'mybucket', Key: file, ExpressionType:…
user14013917
  • 149
  • 1
  • 10
0
votes
1 answer

S3-Select Pricing on JSON

I am confused about the S3 select pricing regarding data returned and data scanned. If I want to access something at an index in a json file, does it still scan the entire file and the data scanned counts for the entire file size? Suppose I use the…
Nock
  • 3
  • 1