I am looking for a way to query the AWS DynamoDB data with SQL Syntax using amazon EMR.
I have my DynamoDB table set up and ready. How can I import/query the data using Hue? The table in DynamoDB has a size of around 8GB.
I am looking for a way to query the AWS DynamoDB data with SQL Syntax using amazon EMR.
I have my DynamoDB table set up and ready. How can I import/query the data using Hue? The table in DynamoDB has a size of around 8GB.
Please follow the below steps:-
Hive to query non-live DynamoDB data:-
1) Export Data from DynamoDB to Hive
Refer Section : Exporting Data from DynamoDB in EMR Hive Commands link below
2) Use Amazon EMR to query data stored in DynamoDB
Refer Section : Querying Data in DynamoDB in EMR Hive Commands link below
3) Use Hue to run the queries (i.e. run Hive queries from Hue workbench)
Hive to query live DynamoDB:-
1) Create Hive table to map to DynamoDB table
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/EMR_Interactive_Hive.html
2) Once you create the Hive table and run queries on it, it will refer the live DynamoDB table to get the data
Disadvantage : It consumes DynamoDB read or write units for each execution. In other words, it will cost you for each query execution.
Sample code:-
CREATE EXTERNAL TABLE hivetable1 (col1 string, col2 bigint, col3 array<string>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "dynamodbtable1",
"dynamodb.column.mapping" = "col1:name,col2:year,col3:holidays");