0

The python code below can fetch data from a pre-configured athena table when it is run on local computer. But it automatically creates an S3 bucket to store temporary tables and metadata. The automatically created bucket name looks like aws-athena-query-results-<account_id>-{aws-region>. Is there a way to specify awswrangler to use another S3 location?

import awswrangler as wr

db_name = "test_db"
table_name = "test_table"
query = f"SELECT * FROM {db_name}.{table_name}"

df = wr.athena.read_sql_query(sql=query, database=db_name)
d.b
  • 32,245
  • 6
  • 36
  • 77
  • May be this be of help; https://aws.amazon.com/blogs/machine-learning/configure-a-custom-amazon-s3-query-output-location-and-data-retention-policy-for-amazon-athena-data-sources-in-amazon-sagemaker-data-wrangler/ – Jimson James Feb 22 '23 at 17:32
  • also, you can pass custom path to `s3_output` arg as well. – Jimson James Feb 22 '23 at 17:39
  • @JimsonJames, I saw that link too and it seems relevant. But I am not using SageMaker. `s3_output` did work and you can make that an answer. Ideally, I would like to specify the default S3 so I don't have to provide `s3_output` every time. But it looks like `s3_output` does the job. – d.b Feb 22 '23 at 17:52

1 Answers1

2

Use s3_output for setting custom output path per query as shown below.

# Query Athena using the wrangler library
query = "SELECT * FROM my_table LIMIT 100"
df = wr.athena.read_sql_query(query, database="my_database", s3_output=S3_OUTPUT_LOCATION)

There could be some exception like the one raised in this ticket, User specified s3_output not handled correctly in athena.read_sql_query with unload_approach=True

Unfortunately there is no way as of now to specify it as part of wrangler global config.

Jimson James
  • 2,937
  • 6
  • 43
  • 78