Why my PyAthena generate a csv and a csv meta data file in s3 location while reading a GLUE table?

Question

I started to pull GLUE table via using pyathena since last week. However, one annoying thing I noticed that is if I wrote my code as shown below, sometimes it works and returns a pandas dataframe but other times, this piece of codes will create a csv and a csv metadata in the folder where physical data (parquet) are stored in S3 and registered in GLUE.

I know that if you use pandas cursor, it may end up with these two files but I just wonder if I can access data without these two files since every time these two files generated in S3, my read in process failed.

Thank you!

import os
access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
connect1 = connect(s3_staging_dir='s3://xxxxxxxxxxxxx')

df = pd.read_sql("select * from abc.table_name", connect1)
df.head()

IDB_LIM · Answer 1 · 2020-02-15T19:51:37.117

0

go to Athena
click settings -> workgroup name -> edit workgroup
Update "Query result location"
click "Override client-side settings"

Note: If you have not setup any other workgroups for your Athena environment, you should only find one workgroup named "Primary".

This should resolve your problem. For more information you can read:

https://docs.aws.amazon.com/athena/latest/ug/querying.html

edited Feb 15 '20 at 19:51

answered Feb 14 '20 at 00:54

IDB_LIM

1
1

Why my PyAthena generate a csv and a csv meta data file in s3 location while reading a GLUE table?

1 Answers1