0

I started to pull GLUE table via using pyathena since last week. However, one annoying thing I noticed that is if I wrote my code as shown below, sometimes it works and returns a pandas dataframe but other times, this piece of codes will create a csv and a csv metadata in the folder where physical data (parquet) are stored in S3 and registered in GLUE.

I know that if you use pandas cursor, it may end up with these two files but I just wonder if I can access data without these two files since every time these two files generated in S3, my read in process failed.

Thank you!

import os
access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
connect1 = connect(s3_staging_dir='s3://xxxxxxxxxxxxx')

df = pd.read_sql("select * from abc.table_name", connect1)
df.head()
ASU_TY
  • 617
  • 2
  • 7
  • 12

1 Answers1

0
  1. go to Athena
  2. click settings -> workgroup name -> edit workgroup
  3. Update "Query result location"
  4. click "Override client-side settings"

Note: If you have not setup any other workgroups for your Athena environment, you should only find one workgroup named "Primary".

This should resolve your problem. For more information you can read:

https://docs.aws.amazon.com/athena/latest/ug/querying.html

IDB_LIM
  • 1
  • 1