Questions tagged [amazon-s3-select]

Amazon S3 Select enables applications to retrieve only a subset of data from an Amazon S3 object by using simple SQL expressions.

See: SQL Functions - Amazon Simple Storage Service

91 questions
1
vote
1 answer

Write a S3 Select query to exclude a carriage return(\r) rows

I have a csv column that has data with \r character. How can write a query to eliminate such data SELECT rv FROM s3object s this gives me: I don't want such rows. Want to eliminate it all. This query still returns me the same results SELECT rv…
Denil Parmar
  • 154
  • 12
1
vote
1 answer

Getting maxCharsPerRecord: 1,048,576 in S3 in AWS S3 SelectObjectContent

I am fetching records from the s3 JSON file using s3 select. Everything working for me when I fetch data from small JSON files ie 2MB(with record count around 10000) Following is my query innerStart = 1 innerStop = 100 maximumLimit = 100 query =…
Dibish
  • 9,133
  • 22
  • 64
  • 106
1
vote
0 answers

How to use S3 Select for Nested Parquet Objects

I have dumped data into a parquet file. When I use SELECT * FROM s3object s LIMIT 1 it gives me the following result. { "name": "John", "age": "45", "country": "USA", "experience": [{ "company": { …
1
vote
2 answers

javascript - convert string to json array

I was using s3 select to fetch selective data and display them on my front end . I converted array of byte to buffer and then to string like below as string let dataString = Buffer.concat(records).toString('utf8'); the result i got was string like…
sumit
  • 15,003
  • 12
  • 69
  • 110
1
vote
1 answer

Code Optimization on s3 read csv and ingest back to s3 bucket

ddict = defaultdict(set) file_str = query_csv_s3(s3, BUCKET_NAME, filename, sql_exp, use_header) # read CSV to dataframe df = pd.read_csv(StringIO(file_str)) fdf = df.drop_duplicates(subset='cleverTapId',…
1
vote
1 answer

AWS S3 Select get data for column with a / in the name

I am trying to use S3 Select to query some data from a CSV file on S3 using the following query: aws s3api select-object-content \ --bucket \ --key \ --expression "select `lineItem/intervalUsageStart` from s3object limit 100"…
jobin
  • 2,600
  • 7
  • 32
  • 59
1
vote
2 answers

Aws s3 selectObjectContent by version id

Is there a way we can run select object content (s3 select) on specific version of s3 object using version Id? I cannot find any references in select object content documentation to specify the version Id like we have version Id field in get Object…
1
vote
0 answers

Is it possible to consider the second row as the header for a .csv file in S3 Select?

Is it possible to consider the second row of a .csv file as the headers and skip the first row in S3 Select? Example: The structure of my file is as follows: A B C a b c d e f 1 2 3 4 5 6 Now I want skip A B C And query on a b c d e…
Pallav Doshi
  • 209
  • 2
  • 9
1
vote
1 answer

S3 select query not recognizing data

I generate a dataframe, write the dataframe to S3 as CSV file, and perform a select query on the CSV in S3 bucket. Based on the query and data I expect to see '4' and '10' printed but I only see '4'. For some reason S3 is not seeing the '10'. It…
1
vote
1 answer

AWS S3 Select skips missing values in result set

I'm trying to read a parquet file using S3 Select, but running into issues when the data contains missing values - the results returned from S3 select skip all missing values, making it impossible to parse the output. A reproducible example with…
ytsaig
  • 3,267
  • 3
  • 23
  • 27
1
vote
2 answers

S3 Select Python error

I'm trying to catch the data form a S3 object. I'm using a S3 Select feature as below: boto3 version : 1.7.59 import boto3 s3 = boto3.client('s3') r = s3.select_object_content( Bucket="bucket", Key="file.json", ExpressionType='SQL', …
1
vote
3 answers

Querying rows by index in S3 Select

With mysql the following code: SELECT * from TABLE limit 5, 10 would pull the 5th through 10th rows of the table. What is the equivalent for doing this through the SQL engine in S3 select (PrestoDB I believe)? Is there a rownumber constructor or…
Ajjit Narayanan
  • 632
  • 2
  • 8
  • 18
1
vote
2 answers

s3 select to pandas Dataframe

I am using S3 Select to read the csv file and outputting into JSON. Now I want the JSON Output from S3 Select into pandas dataframe. Is it possible to convert S3 Select JSON output to pandas dataframe?
thotam
  • 941
  • 2
  • 16
  • 31
1
vote
1 answer

S3 Select with boto3 - internalerror

Has anyone got "S3 Select" (https://aws.amazon.com/blogs/aws/s3-glacier-select/ , https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-s3-select-is-now-generally-available/) with boto3 (or even cli or another sdk) working? I am getting…
tooptoop4
  • 234
  • 3
  • 15
  • 45
0
votes
0 answers

How to use S3 Select for Nested Parquet Objects?

I am getting started with the s3-select and I am trying to get the count of array size in the inner parquet object. Following example is one entry from the parquet file. { "id" : 12, "date" : "2023-07-06" "employee": { "name": "stack…
Dipu
  • 1