5

I am trying to load the DynamoDb export file which is taken from Amazon Dynamodb Web Console with "Import/Export" tool into Hive. But I couldn't map the fields properly because DynamoDB Web Console "Export" tool is using "ETX" "STX".

Below is an example row ended with [LF]

Elapsed[ETX]{"n":"1477"}[STX]Device[ETX]{"n":"3"}[STX]Date[ETX]{"s":"2014-03-05T12:13:00.852Z"}[STX]Duration[ETX]{"n":"8075"}[LF]

What should be the query for this?

CREATE EXTERNAL TABLE IF NOT EXISTS TableNameHere (creationDate string, device bigint, duration bigint, elapsed bigint)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ???This is where i got stuck??? 
LOCATION 's3://abcdefg/ino/2015-05-28_12.22';

UPDATE

I have updated query but it didn't work again.

'\002' for STX
'\012' for LF

CREATE EXTERNAL TABLE IF NOT EXISTS TableNameHere  (creationDate string, device bigint, duration bigint, elapsed bigint)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' 
                     LINES TERMINATED BY '\012'
LOCATION 's3://abcdefg/ino/2015-05-28_12.22';

Result for the query:

Elapsed{"n":"0"}   Device{"n":"3"}   Duration{"n":"1073876"}   Date{"s":"2014-01-27T00:52:25.491Z"}     

So, now how can i parse this data? I need to map fields. Should i use a custom SerDe?

Josh Crozier
  • 233,099
  • 56
  • 391
  • 304
Barbaros Alp
  • 6,405
  • 8
  • 47
  • 61
  • This is the format that i am trying to map its fields in hive: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-pipelinejson-verifydata2.html Can it be done with a custom SerDe? – Barbaros Alp May 29 '15 at 07:20

1 Answers1

2

I think there's already a SerDe for that: https://github.com/lyft/dynamodb-hive-serde

Roberto Congiu
  • 5,123
  • 1
  • 27
  • 37