I would like to create a database in Athena via API. I have parquet files in S3 that I would like to query using the API and I would like to use Athena for the query.
Anyway I can create a database via API for Athena?
I would like to create a database in Athena via API. I have parquet files in S3 that I would like to query using the API and I would like to use Athena for the query.
Anyway I can create a database via API for Athena?
Creating a database in Athena can be done by creating your own API request or using the SDK.
Here is a Python example using the SDK:
import boto3
client = boto3.client('athena')
config = {'OutputLocation': 's3://TEST_BUCKET/'}
client.start_query_execution(
QueryString = 'create database TEST_DATABASE',
ResultConfiguration = config
)
There are SDKs available for Java, .NET, Node, PHP, Python, Ruby, Go, and C++. If you want to create your own API requests, I recommend developing a good understanding of the signing process. You could also use the AWS CLI as such:
$ aws athena start-query-execution --query-string "CREATE database ATHENA_TEST_TWO" --result-configuration "OutputLocation=s3://TEST_BUCKET/"
Once you have a database created, you can then pass the database name in your query requests.
context = {'Database': 'TEST_DATABASE'}
client.start_query_execution(QueryString='CREATE TABLE ...',
QueryExecutionContext = context,
ResultConfiguration=config)
To see some DDL creating a table from Parquet files see the following examples on the Amazon Athena User Guide.
Edit In reponse to @condo1234's questions:
But how do I associated a database with a file in S3?
The short answer is you don't. You associate a table with files sharing a prefix in a bucket in S3.
For example, say I want to create a table to analyze data held in s3://TEST_BUCKET
. Through the AWS Console, I can use the poorly named "Create Folder" button to create a prefix called one-table-many-files/
. I then created two csv files:
f1.csv
Codd,1923
Ellison,1944
Chamberlin,1944
Boyce,1947
f2.csv
Hopper,1906
Floyd,1953
Moriarty Wolf Chambers,1980
I then uploaded these text files to the example bucket/prefix combination s3://TEST_BUCKET/one-table-many-files/
I ran the following DDL:
CREATE EXTERNAL TABLE php_test.computer_scientists (
name string,
year_born int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://TEST_BUCKET/one-table-many-files/';
And ran the following SQL Statement:
SELECT * FROM php_test.computer_scientists;
I got the following results back, with data from both files in the bucket + prefix combination specified in the DDL.
"name","year_born"
"Hopper","1906"
"Floyd","1953"
"Moriarty Wolf Chambers","1980"
"Codd","1923"
"Ellison","1944"
"Chamberlin","1944"
"Boyce","1947"
Notice that I am using the word "prefix" and not "folder"? That is because S3 has no concept of a folder! These prefixes are useful however, as they allow for Athena Partitioning.
Per your request, here is a php example as well.
<?php
print('Welcome to PHP');
require 'aws-autoloader.php';
$athena = new Aws\Athena\AthenaClient(['version' => 'latest', 'region' => 'us-east-1' ]);
$athena->StartQueryExecution([
'QueryString' => 'CREATE DATABASE php_test;',
'ResultConfiguration' => [
'OutputLocation' => 's3://TEST_BUCKET/', // REQUIRED
],
]);
?>
See the PHP SDK Documentation for more.
What about java and AWS SDK example. Information on how to create a database is not reflected in the documentation and examples on AWS, but the scientific poke method does wonders. Use this https://docs.aws.amazon.com/athena/latest/ug/code-samples.html example on AWS and just change this
QueryExecutionContext queryExecutionContext =
QueryExecutionContext.builder().catalog("YourAWSCatalogName").build()
Where YourAWSCatalogName is the name of Data Source in your Athena console on AWS.
Next, use a query with DDL to create database.
It is a mystery to me why it was impossible to write about this in the same Athena AWS example.