1

How to merge existing Partition small files into one large file in one of the Partition .

For example I have a table user1, it contain columns fname,lname and partition column is day.

I have created table by using below script

CREATE TABLE user1(fname string,lname string) parittioned By (day int);

After inserting data into partion table it will look like below.

 fname  lname  day
.....................
AA      AAA   20170201     ....>partition 20170201
BB      BBB   20170201
...................
CC      CCC   20170202    ......>partition 20170202
DD      DDD   20170202
....................
EE      EEE   20170203    .......>partition 20170203
FF      FFF   20170203
.......................
GG      GGG   20170204    ........>partition 20170204         
HH      HHH   20170204
.......................

When I execute select query with the help of partition column i.e. day=20170201.

select * from user1 where day=20170201;

It will give result like below

AA      AAA   20170201
BB      BBB   20170201

based on above table i want to merge the all small files i.e day =20170201 and day =20170202 and day=20170203 into partition day=20170203 in my partition table (i.e USer1).i.e. It should look like below.

fname  lname  day
.....................
AA      AAA   20170201
BB      BBB   20170201
CC      CCC   20170202    
DD      DDD   20170202
E       EEE   20170203    .......>partition 20170203
FF      FFF   20170203
.......................
GG      GGG   20170204    ........>partition 20170204         
HH      HHH   20170204
.......................

can you please suggest on this,How can I achieve this?

Thanks in Advance.

Sai
  • 1,075
  • 5
  • 31
  • 58

1 Answers1

0
  1. Create new table partitioned by new field partition_day:
CREATE TABLE user_new(fname string,lname string, day int) parittioned By (partition_day int);
  1. Load data into new table (define your conditions for new partitionsin the case )
   insert overwrite table user_new partition (partition_day)
    select fname,lname, day,
           case when day <= 20170203 then 20170203
                when day >  20170203 then 20170204
           end as partition_day
      from user1 ;
Kishore
  • 5,761
  • 5
  • 28
  • 53
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Hi Thanks for reply, it will work for new table but I want to use same existing table. so I add new column to existing table with the help of “alter table user1 add columns(partition_day int);” New column add with NULL value. Now partition_day contains values “NULL” But after that I am trying to override the same table by selecting the same table like below. “insert overwrite table user1 partition (day)select fname,lname,day as partition_day, case when day <= 20170206 then 20170206 when day > 20170206 then 20170207 end as day from user1” – Sai Feb 08 '17 at 12:31
  • Then also partition_day contains Null values only. Can please help on this how can set the day value to partition_Day. Thanks in Advance . – Sai Feb 08 '17 at 12:31