Questions tagged [amazon-s3-select]

Amazon S3 Select enables applications to retrieve only a subset of data from an Amazon S3 object by using simple SQL expressions.

See: SQL Functions - Amazon Simple Storage Service

91 questions
0
votes
1 answer

Use of NUL character as delimiter on AWS S3 Object Select

I'm trying to use S3 Select Object content to parse a CSV file that has NUL delimiter. The S3 API command I've tried is - aws s3api select-object-content \ --bucket "testbucket" \ --key test.csv.gz \ --expression "Select * from s3object s" \ …
0
votes
2 answers

S3 select - How can I query by non-standard timestamp comparison

I'm using a S3 bucket where the data is organized into files by an ID & year/month - meaning one file per ID & month. In each (csv.gz) file each record has a timestamp in the format: YYYY-MM-dd HH:mm:ss (note the missing T). Now, when querying the…
0
votes
0 answers

S3 SELECT on Parquet file doesn't return any record

import boto3 s3 = boto3.client('s3') resp = s3.select_object_content( Bucket='s3-nb-demo', Key='sample_data.parquet', ExpressionType='SQL', Expression="SELECT * FROM s3object s where s.\"Name\" = 'NB'", InputSerialization =…
0
votes
0 answers

Querying data on S3 Object

Trying to query data on json file using S3-Select { "groups_id": { "307225": { "created_at": "2015-02-10T17:24:15-08:00", "updated_at": "2017-09-06T17:25:22-07:00", "name":…
Prasad
  • 1
  • 4
0
votes
0 answers

Amazon S3 Select Query : Column values get left shifted in CSV output when we query a non-existant key

I have a JSON file in S3 of the format: { "A":"a", "C":{"C1":"c1","C2":"c2"}, "E":"e" } And when I query like select S3Object.A,S3Object.C1,S3Object.C2,S3Object.E from S3Object I get below in CSV output: A,C1,C2,E a,e I understand that the…
Supriya
  • 17
  • 1
  • 10
0
votes
2 answers

does EMR cluster size matters to read data from S3 using spark

Setup: latest (5.29) AWS EMR, spark, 1 master 1 node. step 1. I have used S3Select to parse a file & collect all file keys for pulling from S3. step 2. Use pyspark iterate the keys in a loop and do the following spark .read …
Jason B
  • 21
  • 7
0
votes
0 answers

S3 Select - get last X seconds of data from CSV

I'm using the Javascript AWS S3 SDK to extract data from a CSV on my server. This is done via below SQL query statement: SELECT timestamps, parameterX FROM S3Object WHERE ${timestamp_header} > '${startTime}' and ${timestamp_header} <…
mfcss
  • 1,039
  • 1
  • 9
  • 25
0
votes
1 answer

S3 Select Invalid Path component

I'm trying to figure out how to use AWS S3 Select, everything seems pretty straight forward, but the following query just doesn't want to work: select r.value from S3Object[*].outputs.private_subnets r the above returns Invalid Path component. This…
Alex Zel
  • 660
  • 2
  • 12
  • 27
0
votes
1 answer

How to s3-select all data within inner array of parquet file

I have parquet files on s3 which need to be queried using S3 Select. The parquet files are generated from JSON files with inner arrays. The S3 Select query can get the first array but if i tried to query the records in the inner array it fails to…
0
votes
1 answer

How do I get an s3 select query to return individual rows

With data structured like so { "rows": [ { "rowId": "IDP_2z8dfj9KbB1hrPI_1554508960_1_1", "version": "1554508960", "lastUpdatedDate": 1554508960604, "createdAt": 1554508960604, …
Aaron Wilson
  • 130
  • 7
0
votes
2 answers

S3 Select on CSV file - how to match substring

I have a CSV file uploaded to an S3 bucket. I want to return rows that match a substring of a field Display. What's the right SELECT syntax? This returns 0 rows: "select * from s3object s where 'substring' in s.Display LIMIT 100" Thanks for your…
TrickiDicki
  • 143
  • 2
  • 13
0
votes
1 answer

Query S3 in parallel with SQL and partitioning

Is it possible to make simplest concurrent SQL queries on S3 file with partitioning? The problem it looks like you have to choose 2 options from 3. You can make concurrent SQL queries against S3 with S3 Select. But S3 Select doesn't support…
VB_
  • 45,112
  • 42
  • 145
  • 293
0
votes
1 answer

Amazon S3 SELECT returning garbage data from a .csv file in S3 Bucket (using .NET SDK)

Below are two methods that are part of my state machine in AWS. First, the method that uses S3 SELECT to obtain data from a csv file. /// /// Use S3 Select in order to obtain the data from the source and return it /// ///…
JamesMatson
  • 2,522
  • 2
  • 37
  • 86
0
votes
1 answer

Count reoccurring variable within AWS-S3 bucket using S3-Select query

I'm running a Python script to query an AWS-S3 bucket using the AWS-S3-Select tool. I'm importing a variable from a txt file and want to pass it into the S3-Select query. I also want to count all imported variable recurrences (within a specified…
QAE
  • 13
  • 1
  • 5
0
votes
1 answer

How to retrieve partial S3 object values by key

Given an S3 bucket called my-bucket that includes a bucket with key named my-object, is it possible to retrieve values from the object if the object value consists of a list of key/value pairs? i.e. if my-object contains a file with the following…
cvoep28
  • 423
  • 5
  • 9
  • 21