0

I have my hive tables stored as Parquet format in a location in HDFS. Can I convert the parquet files in this location to Sequence file format and build hive tables over it? Is there any procedure to do this conversion?

Andy Reddy
  • 93
  • 2
  • 9

2 Answers2

1

Create new sequence file table and reload data using insert select:

insert into sequence_table
select * from parquet_table;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • let me try it. Thank you. – Andy Reddy Mar 27 '17 at 17:16
  • if my sequence table is partitioned by year, month, day then how can I insert all the records from my parquet table which is partitioned by year, month, day as it is into my sequence table? – Andy Reddy Mar 29 '17 at 04:14
  • create partitioned table, `insert overwrite table sequence_table partition (year, month, day) select from parquet table`, partitions keys should be last, add distribute by partition keys at the end to reduce pressure on reducers. If the target table has exactly the same structure you can select *. – leftjoin Mar 29 '17 at 06:54
1
hive> create table src (i int) stored as parquet;
OK
Time taken: 0.427 seconds
hive> create table trg stored as sequencefile as select * from src;

For @AndyReddy

create table src (i int) 
partitioned by (year int,month tinyint,day tinyint)
stored as parquet
;

create table trg (i int) 
partitioned by (year int,month tinyint,day tinyint)
stored as sequencefile
;

set hive.exec.dynamic.partition.mode=nonstrict
;

insert into trg partition(year,month,day)
select * from src
;
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • if my sequence table is partitioned by year, month, day then how can I insert all the records from my parquet table which is partitioned by year, month, day as it is into my sequence table? Just do insert into? – Andy Reddy Mar 29 '17 at 04:35