-1

I have one issue in my hive code. I want to extract JSON data from using HIVE.Following is the sample json format

{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"versionModified"{"machine":"123.dfer","founder":"3.0","state":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}

I want to get the following fields

  • ver
  • type
  • vehicle
  • ts
  • founder
  • state

the issue is founder and state is in one array "version" can anybody help how to get rid of this? some times instead of versionmedified something else may come

eg: some times my data will be like

{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"anotherCriteria":{"engine":"123.dfer","developer":"3.0","state":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}

adding some sample data below:

{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC"{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}


{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP"{"XVY":"123.dfer","FAH":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}


{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX"{"VOG":"123.dfer","FAH":"3.0","FAX":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}

i need to put this data into various tables based on version if it is "BOX" the put in one table if it is "GAP" put another...

franklinsijo
  • 17,784
  • 4
  • 45
  • 63
BigD
  • 850
  • 2
  • 17
  • 40

1 Answers1

1

you can use json serde to fetch all fields

Just follow below steps

1.Download json serde from http://www.congiu.net/hive-json-serde/1.3/

2.Add json serde Jar

hive> ADD jar /root/json-serde-1.3-jar-with-dependencies.jar;
Added [/root/json-serde-1.3-jar-with-dependencies.jar] to class path
Added resources: [/root/json-serde-1.3-jar-with-dependencies.jar]

3.create table

CREATE TABLE json_serde_table (
  Rtype struct<ver:int, os:string,type:string,vehicle:string,MOD: struct<Version:Array<struct<versionModified:struct<machine:string,founder:string,state:string,fashion:string,cdc:string,dof:string,ts:string>>>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

4.load json file into table

hive> load data local inpath '/root/json.txt' INTO TABLE json_serde_table;
Loading data to table default.json_serde_table
Table default.json_serde_table stats: [numFiles=1, totalSize=234]
OK
Time taken: 0.877 seconds

5.Fire below query to get result

hive> select Rtype.ver ver ,Rtype.type type ,Rtype.vehicle vehicle ,Rtype.MOD.version[0].versionModified.ts ts,Rtype.MOD.version[0].versionModified.founder founder,Rtype.MOD.version[0].versionModified.state state from json_serde_table;
Query ID = root_20170412170606_a674d31b-31d7-477b-b9ff-3ebd76636cf8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1491484583384_0018, Tracking URL = http://mac127:8088/proxy/application_1491484583384_0018/
Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job  -kill job_1491484583384_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-04-12 17:06:44,990 Stage-1 map = 0%,  reduce = 0%
2017-04-12 17:06:53,361 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.8 sec
MapReduce Total cumulative CPU time: 1 seconds 800 msec
Ended Job = job_1491484583384_0018
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.8 sec   HDFS Read: 4891 HDFS Write: 50 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 800 msec
OK
1       ns      Mh-3412 2000-04-01T00:00:00.171Z        3.0     Florida
Time taken: 19.745 seconds, Fetched: 1 row(s)
Shalaj
  • 579
  • 8
  • 19
  • colon (:) is missing in your json data after "versionModified" field – Shalaj Apr 12 '17 at 12:00
  • Why would you download a JSON SerDe when it is already a part of the distribution? – David דודו Markovitz Apr 12 '17 at 12:07
  • you are right same can be done with org.apache.hadoop.hive.contrib.serde2.JsonSerde , i have just tried and it gives me same result.. i will edit my answer.. thanks – Shalaj Apr 12 '17 at 12:15
  • Thanks you all for great help..my source is a table and this entire json content in one column . in that case how do i load? – BigD Apr 12 '17 at 13:27
  • i also done it in a single column only, just check the create table script, here we have a single column RType – Shalaj Apr 12 '17 at 13:40
  • waht i was telling lis like...instead of using load data local inpath '/root/json.txt' INTO TABLE json_serde_table this json.txt itself is in another table as a column – BigD Apr 12 '17 at 14:08
  • added new question for insert... can you help @Dudu – BigD Apr 13 '17 at 04:44