1

I have a simple hive table:

hive> show create table  tweets;
OK
CREATE EXTERNAL TABLE `tweets`(
  `json_body` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'file:/tmp/1'
TBLPROPERTIES (
  'bucketing_version'='2',
  'transient_lastDdlTime'='1551081429')
Time taken: 0.124 seconds, Fetched: 13 row(s)

in the folder /tmp/1 there is a file test.json and the only contents in file are {"appname":"app-name"}

select from tweets returns NULL

hive> select * From tweets;
OK
NULL
Time taken: 0.097 seconds, Fetched: 1 row(s)

I know either the fileformat is wrong or something else is going on. can someone please help.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
John Smith
  • 315
  • 2
  • 9

1 Answers1

1

If you want JsonSerDe to parse attributes then create table like this:

CREATE EXTERNAL TABLE tweets (
  appname string 
)
ROW FORMAT SERDE
  'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/1' --this is HDFS/S3 location
;

Read also docs about JsonSerDe

And if you want to get the whole JSON object as a string json_body then you do not need JSON SerDe, use TEXTFILE instead:

CREATE EXTERNAL TABLE tweets (
  json_body string 
)
STORED AS TEXTFILE
LOCATION '/tmp/1' --this is HDFS/S3 location
;
leftjoin
  • 36,950
  • 8
  • 57
  • 116