15

I am trying to use boto3 to run a set of queries and don't want to save the data to s3. Instead I just want to get the results and want to work with those results. I am trying to do the following

import boto3

client = boto3.client('athena')
response = client.start_query_execution(
        QueryString='''SELECT * FROM mytable limit 10''',
        QueryExecutionContext={
            'Database': 'my_db'
            }.        
        ResultConfiguration={
            'OutputLocation': 's3://outputpath',
            }
        )

print(response)

But here I don't want to give ResultConfiguration because I don't want to write the results anywhere. But If I remove the ResultConfiguration parameter I get the following error

botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in input: "ResultConfiguration"

So it seems like giving s3 output location for writing is mendatory. So what could the way to avoid this and get the results only in response?

muazfaiz
  • 4,611
  • 14
  • 50
  • 88

3 Answers3

4

The StartQueryExecution action indeed requires a S3 output location. The ResultConfiguration parameter is mandatory.

The alternative way to query Athena is using JDBC or ODBC drivers. You should probably use this method if you don't want to store results in S3.

spg
  • 9,309
  • 4
  • 36
  • 41
  • The objective is to run those queries using Python boto3. I know that I can run these using SQL workbench and configuring JDBC drivers there but not sure whether I can run this using Python? – muazfaiz Oct 24 '18 at 14:12
2

You still need to provide s3 as temporary location for Athena to save the data although you want to process the data using python. But you can page through the data as tuple using Pagination API. please refer to the example here. Hope that helps

Tanveer Uddin
  • 1,520
  • 9
  • 15
2

You will have to specify an S3 temp bucket location whenever running the 'start_query_execution' command. However, you can get a result set (a dict) by running the 'get_query_results' method using the query id.

The response (dict) will look like this:

{
'UpdateCount': 123,
'ResultSet': {
    'Rows': [
        {
            'Data': [
                {
                    'VarCharValue': 'string'
                },
            ]
        },
    ],
    'ResultSetMetadata': {
        'ColumnInfo': [
            {
                'CatalogName': 'string',
                'SchemaName': 'string',
                'TableName': 'string',
                'Name': 'string',
                'Label': 'string',
                'Type': 'string',
                'Precision': 123,
                'Scale': 123,
                'Nullable': 'NOT_NULL'|'NULLABLE'|'UNKNOWN',
                'CaseSensitive': True|False
            },
        ]
    }
},
'NextToken': 'string'
}

For more information, see boto3 client doc: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_results

You can then delete all files in the S3 temp bucket you've specified.