13

I have my .csv files saved in the S3 Bucket. I am able to query the data of S3 using AWS Athena. Is there any way we can connect the lambda function to athena and query the data from lambda function. please help

Thanks

Vipendra Singh
  • 689
  • 2
  • 12
  • 26

4 Answers4

36

Like Chris Pollard said, you can use boto3 to query Athena from a Lambda function.

http://boto3.readthedocs.io/en/latest/reference/services/athena.html

To initialize the Athena client:

import boto3
client = boto3.client('athena')

You will then execute your query:

queryStart = client.start_query_execution(
    QueryString = 'SELECT * FROM myTable',
    QueryExecutionContext = {
        'Database': 'myDatabase'
    }, 
    ResultConfiguration = { 'OutputLocation': 's3://your-bucket/key'}
)

If you want to retrieve the results within Lambda (possibly using a second function, due to time constraints - see docs - also note that you pay per 100ms running time), you would use get_query_execution to determine the status of the query:

queryExecution = client.get_query_execution(QueryExecutionId=queryStart['QueryExecutionId'])

You will need to parse the returned object for the value of the QueryExecution.Status.State field. Continue updating the object using get_query_execution() until the result is Succeeded.

Note: Please don't call get_query_execution() in a continuous loop. Rather, use an exponential backoff algorithm to prevent being throttled by that API. You should use this approach for all API calls.

Then you can use get_query_results() to retrieve the results for processing:

results = client.get_query_results(QueryExecutionId=queryStart['QueryExecutionId'])
Tyrone321
  • 1,702
  • 15
  • 23
  • Hi i got one error. It says "errorMessage": "Parameter validation failed:\nMissing required parameter in input: \"ResultConfiguration\"" – Vipendra Singh May 14 '18 at 06:49
  • Read the documentation for the method call. http://boto3.readthedocs.io/en/latest/reference/services/athena.html#Athena.Client.start_query_execution – Tyrone321 May 15 '18 at 07:57
  • 3
    The command client.get_query_execution(queryStart.QueryExecutionId) gives error AttributeError: 'dict' object has no attribute 'QueryExecutionId'. I had to change to client.get_query_execution(QueryExecutionId=queryStart['QueryExecutionId']) – maurera Feb 28 '20 at 22:26
8

Yes! You can use boto3 to interact with Athena.

Particularly, you're going to probably want the start_query_execution method.

http://boto3.readthedocs.io/en/latest/reference/services/athena.html#Athena.Client.start_query_execution

Chris Pollard
  • 1,625
  • 8
  • 11
  • Exactly this :p – NoorJafri May 11 '18 at 12:11
  • 1
    Hi pollard I tried using the syntax given by them but i am unable to specify the parameters to insert in the syntax. If possible can you please give me a small tutorial. It will help me alot. Actually the documentation provided by boto3 is not enough. Any example will work . Thanks and Regards Vipendra – Vipendra Singh May 14 '18 at 06:51
6

The simplest is to use awscrawler and its custom layer for aws lambda

import awswrangler as wr
sql = "select * from my_table"
df = wr.athena.read_sql_query(
    sql=sql, database="my_table", ctas_approach=True
)
DaveR
  • 1,696
  • 18
  • 24
2

You can use boto3 client to query Athena tables.

You can read more about it here: Simple way to query Amazon Athena in python with boto3

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Ashish Singh
  • 61
  • 1
  • 6