17

I would like to create a database in Athena via API. I have parquet files in S3 that I would like to query using the API and I would like to use Athena for the query.

Anyway I can create a database via API for Athena?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
condo1234
  • 3,285
  • 6
  • 25
  • 34

2 Answers2

24

Creating a database in Athena can be done by creating your own API request or using the SDK.

Here is a Python example using the SDK:

import boto3

client = boto3.client('athena')

config = {'OutputLocation': 's3://TEST_BUCKET/'}

client.start_query_execution(
                             QueryString = 'create database TEST_DATABASE', 
                             ResultConfiguration = config
)

There are SDKs available for Java, .NET, Node, PHP, Python, Ruby, Go, and C++. If you want to create your own API requests, I recommend developing a good understanding of the signing process. You could also use the AWS CLI as such:

$ aws athena start-query-execution --query-string "CREATE database ATHENA_TEST_TWO" --result-configuration "OutputLocation=s3://TEST_BUCKET/"

Once you have a database created, you can then pass the database name in your query requests.

context = {'Database': 'TEST_DATABASE'}
client.start_query_execution(QueryString='CREATE TABLE ...', 
                             QueryExecutionContext = context, 
                             ResultConfiguration=config)

To see some DDL creating a table from Parquet files see the following examples on the Amazon Athena User Guide.

Edit In reponse to @condo1234's questions:

But how do I associated a database with a file in S3?

The short answer is you don't. You associate a table with files sharing a prefix in a bucket in S3.

For example, say I want to create a table to analyze data held in s3://TEST_BUCKET. Through the AWS Console, I can use the poorly named "Create Folder" button to create a prefix called one-table-many-files/. I then created two csv files:

f1.csv

Codd,1923
Ellison,1944
Chamberlin,1944
Boyce,1947

f2.csv

Hopper,1906
Floyd,1953
Moriarty Wolf Chambers,1980

I then uploaded these text files to the example bucket/prefix combination s3://TEST_BUCKET/one-table-many-files/

I ran the following DDL:

CREATE EXTERNAL TABLE php_test.computer_scientists (
  name string,
  year_born int
  ) 
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LINES TERMINATED BY '\n'
LOCATION 's3://TEST_BUCKET/one-table-many-files/'; 

And ran the following SQL Statement:

SELECT * FROM php_test.computer_scientists;

I got the following results back, with data from both files in the bucket + prefix combination specified in the DDL.

"name","year_born"
"Hopper","1906"
"Floyd","1953"
"Moriarty Wolf Chambers","1980"
"Codd","1923"
"Ellison","1944"
"Chamberlin","1944"
"Boyce","1947"

Notice that I am using the word "prefix" and not "folder"? That is because S3 has no concept of a folder! These prefixes are useful however, as they allow for Athena Partitioning.

Per your request, here is a php example as well.

<?php
print('Welcome to PHP');

require 'aws-autoloader.php';

$athena = new Aws\Athena\AthenaClient(['version' => 'latest', 'region' => 'us-east-1' ]);

$athena->StartQueryExecution([
    'QueryString' => 'CREATE DATABASE php_test;',
    'ResultConfiguration' => [
        'OutputLocation' => 's3://TEST_BUCKET/', // REQUIRED
    ],
]);

?>

See the PHP SDK Documentation for more.

Zerodf
  • 2,208
  • 18
  • 26
  • But how do I associated a database with a file in S3? I am using PHP – condo1234 Dec 14 '17 at 05:59
  • @condo1234 The short answer is you don't. You associate a table with a location in S3. For example, if you want to associate a file test.csv with a table test_table, you place test.csv in a given location in s3 (say s3://myfles/test_table_data/), place test.csv in the location (s3://myfles/test_table_data/test.csv), and then point the table DDL at the location (LOCATION 's3://myfles/test_table_data/'). This allows you to add more data simply by uploading additional files to the "folder" in S3. However, you can't mix and match files of different formats in the same location. – Zerodf Jan 27 '18 at 14:36
0

What about java and AWS SDK example. Information on how to create a database is not reflected in the documentation and examples on AWS, but the scientific poke method does wonders. Use this https://docs.aws.amazon.com/athena/latest/ug/code-samples.html example on AWS and just change this

QueryExecutionContext queryExecutionContext = 
QueryExecutionContext.builder().catalog("YourAWSCatalogName").build()

Where YourAWSCatalogName is the name of Data Source in your Athena console on AWS.

Next, use a query with DDL to create database.

It is a mystery to me why it was impossible to write about this in the same Athena AWS example.

avgolubev
  • 101
  • 8