1

I have a HIVE table defined using a JSON Serde. I'm using the Shark distribution (http://shark.cs.berkeley.edu/). The definition is as follows:

CREATE TABLE lastfm(
artist string,
title string ,
track_id string,
similars array<array<string>>,
tags array<array<string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; 

I am able to successfully load data into this table. Now, I created a Parquet based table in HIVE.

CREATE TABLE lastfm_par (
  artist string,
  title string ,
    track_id string,
    similars array<array<string>>,
    tags array<array<string>>
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 STORED AS
 INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

Now, I try to load the data from the JSON Serde table to the Parquet one using the following command:

insert overwrite table lastfm_par select * from lastfm;

The insert statement is completes successfully. But when I query the data in the Parquet table, all the columns are populated with NULL values. I searched for similar issues online, but haven't seen anything similar yet. Does anyone have some thoughts on what's going wrong here?

Thanks, Visakh

visakh
  • 2,503
  • 8
  • 29
  • 55

0 Answers0