Issue with loading data into Parquet table from a JSON Serde based Hive table

Question

I have a HIVE table defined using a JSON Serde. I'm using the Shark distribution (http://shark.cs.berkeley.edu/). The definition is as follows:

CREATE TABLE lastfm(
artist string,
title string ,
track_id string,
similars array<array<string>>,
tags array<array<string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

I am able to successfully load data into this table. Now, I created a Parquet based table in HIVE.

CREATE TABLE lastfm_par (
  artist string,
  title string ,
    track_id string,
    similars array<array<string>>,
    tags array<array<string>>
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 STORED AS
 INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

Now, I try to load the data from the JSON Serde table to the Parquet one using the following command:

insert overwrite table lastfm_par select * from lastfm;

The insert statement is completes successfully. But when I query the data in the Parquet table, all the columns are populated with NULL values. I searched for similar issues online, but haven't seen anything similar yet. Does anyone have some thoughts on what's going wrong here?

Thanks, Visakh

Issue with loading data into Parquet table from a JSON Serde based Hive table

0 Answers0