I'm using Amazon DynamoDB to collect statistics and ElasticMapReduce with Hive to process statistics and upload results to S3.
On DynamoDB I have table prod_product_views: - id (Hash key) - product_id (Range key) - company_id - creted - price - viewed_by_company_id - viewed_by_user_id
For now in this table there are about 7000 records.
Problem is that hiveql running to slow.
For example I have first step to create external table stored on DynamoDB:
CREATE EXTERNAL TABLE prod_product_views (id string, product_id bigint, company_id bigint, created bigint, price string, viewed_by_company_id bigint, viewed_by_user_id bigint)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "prod_product_views",
"dynamodb.column.mapping" = "id:id,product_id:product_id,company_id:company_id,created:created,price:price,viewed_by_company_id:viewed_by_company_id,viewed_by_user_id:viewed_by_user_id");
This step is ok (Time taken: 12.908 seconds)
Secund step is to get views for last day:
SELECT * from prod_product_views
WHERE
created > UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","00:00:00"))
and created < UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","23:59:59"));
This step takes a long time (about 60min) maybe more.
This is part of output:
2013-05-23 08:23:06,097 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:07,103 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:08,109 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:09,115 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:10,121 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:11,147 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:12,153 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:13,160 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:14,169 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:15,177 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:16,183 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:17,193 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:18,219 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:19,225 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:20,234 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:21,240 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:22,247 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:23,253 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:24,259 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:25,265 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:26,273 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:27,279 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:28,290 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:29,312 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:30,318 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:31,324 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:32,333 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:33,358 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:34,364 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:35,394 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:36,400 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:37,408 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:38,418 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:39,478 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:40,538 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:41,544 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:42,550 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:43,557 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:44,563 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:45,569 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:46,579 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:47,607 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:48,613 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:49,623 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:50,633 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:51,638 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:52,650 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:53,657 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:54,665 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:55,691 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:56,697 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
I'm new with this kind of services, am I doing something wrong, or is there some tricks in configuration or something to speed up this? Because this seems like a simple query and 7000 records is not a large amount of data.
Thanks in advance!