1

I am using S3 Select to read the csv file and outputting into JSON. Now I want the JSON Output from S3 Select into pandas dataframe. Is it possible to convert S3 Select JSON output to pandas dataframe?

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
thotam
  • 941
  • 2
  • 16
  • 31

2 Answers2

1

Not Sure if you still are looking for an answer but this worked for me:

s3 = boto3.client('s3')
bucket = bucket_name

file_name = file_key

sql_stmt = """SELECT S.* FROM s3object S LIMIT 10"""


req = s3.select_object_content(
    Bucket=bucket_name,
    Key=file_key,
    ExpressionType='SQL',
    Expression=sql_stmt,
    InputSerialization = {'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization = {'CSV': {}},
)

records = []
for event in req['Payload']:
    if 'Records' in event:
        records.append(event['Records']['Payload'])
    elif 'Stats' in event:
        stats = event['Stats']['Details']


file_str = ''.join(r.decode('utf-8') for r in records)

select_df = pd.read_csv(StringIO(file_str))
print(len(select_df))
df = pd.DataFrame(select_df)
print(df)
Sumedha Nagpal
  • 163
  • 2
  • 15
0

Correct way to look at it is knowing its type('event' type), it will be dict. so converting dict to pandas is easy:

dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
pd.DataFrame([dict_])
Aramis NSR
  • 1,602
  • 16
  • 26