I wanted to ask the AWS community a question.
I recently shifted to Athena, and have the following observation:
It takes much more time to query data using pyathena (python client) than doing it straight in athena. I have a database of customer data, and when I execute a query in athena, it takes less than 60 secs to get the data, but when I execute the same query in Pyathena, it takes about 40 mins to do the same job.
Here is my python query:
cnxn = connect(s3_staging_dir='URL Address for my Athena results',region_name='us-east-2')
sql= ''' SELECT * from some query '''
df= pd.read_sql(sql, cnxn)
Can someone help me understand why this happens? Am i doing anything wrong?
Thank you
----EDITED----
I am running the query in Sagemaker. I am executing the query in Sagemaker's Jupyter env.