HIVE insert overwrite directory with json format

Question

How can I insert overwrite directory with json schema?

There is raw hive avro table; (this is actually has many fields)

tb_test--------
name string
kickname string
-----------------

then I want to save query result into some directory in hdfs by jsonserde.

I tried this.

insert overwrite directory '/json/'
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
 "stat_name"="$._col0",
 "stat_interval"="$._col1"
)
STORED AS TEXTFILE 
select name, nickname
from tb_test limit 100

But written json in /json/ has _colXX field name instead of the origin field name.

{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}

I expected

{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}

What will help this?

Thanks!!

I'm facing a similar issue with AVRO format. Have you found the solution? Please let me know if you know any workarounds. — Munesh, Jul 11 '17 at 21:54
is avro output fotmar or table format? I couldnt find any workaround on hive. Instead I did this by spark. Run spark sql then export dataframe as json. spark support various export format in a nature. — Jihun No, Jul 11 '17 at 23:49
AVRO is the required output file format. Yes, Spark is a good workaround. — Munesh, Jul 12 '17 at 23:43
@JihunNo could you share a sample avro file that you are trying out ? — DataWrangler, Oct 25 '17 at 14:06

leftjoin · Accepted Answer · 2017-12-21T14:14:27.090

It seems your issue with a workaround (using JsonUDF with named_struct) is described here: https://github.com/rcongiu/Hive-JSON-Serde/issues/151

extract.hql:
add jar /home/myuser/lib/json-udf-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
create temporary function tjson as 'org.openx.data.udf.JsonUDF';

insert overwrite local directory '/json/'
select
tjson(named_struct("name", t.name,"nickname", t.nickname))
from tb_test t
;

Also you can create JsonSerDe-based table with columns defined, insert overwrite it and use table location instead of directory.

score 1 · Answer 2 · answered Sep 02 '21 at 18:24

I ran into this today. In case you don't want/can't use UDFs, using this guideline: https://docs.aws.amazon.com/athena/latest/ug/json-serde.html, this worked for me (order of column should be different also using the mapping prefix:

insert overwrite directory '/json/'
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
 "case.insensitive" = "false",
 "mapping._col0" = "stat_name",
 "mapping._col1" = "stat_interval"
)
select name, nickname
from tb_test limit 100

HIVE insert overwrite directory with json format

2 Answers2