I have a HIVE
table defined using a JSON Serde
. I'm using the Shark
distribution (http://shark.cs.berkeley.edu/). The definition is as follows:
CREATE TABLE lastfm(
artist string,
title string ,
track_id string,
similars array<array<string>>,
tags array<array<string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
I am able to successfully load data into this table. Now, I created a Parquet
based table in HIVE
.
CREATE TABLE lastfm_par (
artist string,
title string ,
track_id string,
similars array<array<string>>,
tags array<array<string>>
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
Now, I try to load the data from the JSON Serde
table to the Parquet
one using the following command:
insert overwrite table lastfm_par select * from lastfm;
The insert
statement is completes successfully. But when I query the data in the Parquet
table, all the columns are populated with NULL
values. I searched for similar issues online, but haven't seen anything similar yet. Does anyone have some thoughts on what's going wrong here?
Thanks, Visakh