1

I have parquet data in S3 location which needs to be loaded to Athena for querying. But I don't want to manually load from web UI or manually running the query. Can we do it programmatically by running a code and passing the S3 location?

I don't want to use glue as it is not available in all regions. Can you please help me with code to do that as I am very new to Athena.

Kirk Broadhurst
  • 27,836
  • 16
  • 104
  • 169

1 Answers1

1

In order to query your S3 data from Athena you only need to tell Athena the data location and details about columns, data types, file format, etc. Just to be clear, there's no 'load' - Athena literally accesses the S3 files in place. If you change those underlying files then Athena will change correspondingly.

Instead of 'loading' to Athena, it's better to think of registering your data with Athena. You do this by creating an external table. This answer has a good example. How to Query parquet data from Amazon Athena?

If you don't know the correct syntax to create the external table, I'd suggest you use Glue's ability to create that data definition as a one-time exercise. Once there you can run SHOW CREATE TABLE my_table in Athena, which will show you the statement that was executed to create the table.

Glue is usually not necessary in order to use Athena - it's an easy way to get started, but is just another step that takes time, adds complexity, etc.

Kirk Broadhurst
  • 27,836
  • 16
  • 104
  • 169
  • In my case, the schema for create table changes every time (I am accessing different data location in S3) and I don't want to change the schema manually every time I access different data from S3 using athena. Is there any way to dynamically / programmatically generate CREATE query ? – Dhruvajyoti Chatterjee Sep 03 '18 at 13:39
  • If data has different schemas, that would be multiple tables. One option is to have a AWS Glue Crawler perform the cataloging for you. – Kirk Broadhurst Sep 03 '18 at 13:57