1

My data is stored in HDFS at directory /tmp/kafka/alert in multiple files. Each file contain new-line separated JSON objects like following.

{"alertHistoryId":123456,"entityId":123,"deviceId":"123","alertTypeId":1,"AlertStartDate":"Dec 28, 2016 12:05:48 PM"}
{"alertHistoryId":123456,"entityId":125,"deviceId":"125","alertTypeId":5,"AlertStartDate":"Dec 28, 2016 11:58:48 AM"}

I added hive JSON SerDe jar using below

ADD JAR /usr/local/downloads/hive-serdes-1.0-SNAPSHOT.jar;

I created table with following

CREATE EXTERNAL TABLE IF NOT EXISTS my_alert (
alertHistoryId bigint, entityId bigint, deviceId string, alertTypeId int,  AlertStartDate string
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/tmp/kafka/alert';

table created successfully. But when I fetched data, I got all null values. Anyone got any idea how to resolve this?

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Arjit
  • 421
  • 7
  • 20
  • look at this solution: http://stackoverflow.com/questions/40854177/cloudera-hive-where-to-add-json-serde-1-3-7-jar-file – Rijul Dec 29 '16 at 09:49
  • Thanks Rijul. But that and mine are not the same problem. I didn't get any exception. External table created successfully. Even the row count is right. but data is all null. I'm sure I'm missing something minor but can't figure out what. – Arjit Dec 29 '16 at 09:57

2 Answers2

1

Dont use Serde Adding Jar and converting those is always overhead.Rather than you can read the JSON using inbuilt get_json_object and json_tuple .if you are looking for an example how to use see this blog querying-json-records-via-hive

If you wanted to use JSON Serde only then have a look on this Hive-JSON-Serde. Before test it out first of all validate the JSON Validator.

Indrajit Swain
  • 1,505
  • 1
  • 15
  • 22
  • Thanks @Indrajit. The blog you mentioned says we can first load the data in hive table using `LOAD DATA LOCAL INPATH '/tmp/simple.json' INTO TABLE json_table;` But I don't have a single json file to load the data from. I have a number of dynamically generating files on HDFS. Can you guide me how to work with this solution in my context? – Arjit Dec 29 '16 at 09:31
  • Updated the Answer @Arijit – Indrajit Swain Dec 29 '16 at 10:10
0

You are using old version of JSON Serde. There might be an issue with your JSON Serde and Hadoop Distribution. Please find below link to get new version of Json Serde. Follow the steps from the link to build it according to your Hadoop distribution.

https://github.com/rcongiu/Hive-JSON-Serde

Please see below working example.

hive> add jar /User/User1/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
Added [/User/User1/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar] to class path
Added resources: [/User/User1/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar]
hive> use default;
OK
Time taken: 0.021 seconds
hive> CREATE EXTERNAL TABLE IF NOT EXISTS json_poc (
    > alertHistoryId bigint, entityId bigint, deviceId string, alertTypeId int,  AlertStartDate string
    > )
    > ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    > LOCATION '/User/User1/sandeep_poc/hive_json';
OK
Time taken: 0.077 seconds
hive> select * from json_poc;
OK
123456  123     123     1       Dec 28, 2016 12:05:48 PM
123456  125     125     5       Dec 28, 2016 11:58:48 AM
Time taken: 0.052 seconds, Fetched: 2 row(s)
hive>

How to build jar.

Maven should be installed on your PC then run command like this.

C:\Users\User1\Downloads\Hive-JSON-Serde-develop\Hive-JSON-Serde-develop>mvn -Phdp23 clean package

In my case I am using hdp2.3 so I have provided -Phdp23

Hope it will help if you are willing to use Hive JSON Serde.

Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68
  • Thanks Sandeep. It did create table successfully. Although yours is rather a different jar all together but this worked for me. – Arjit Dec 29 '16 at 10:12