0

Have a multi level partitioned Hive table,now need to delete the partitioned folders which are older than certain years.

Multilevel partitions looks as below.

 /data/warehouse/suite/catalyst/site/company=abc/year=2019/month=08
 /data/warehouse/suite/catalyst/site/company=cde/year=2018/month=05
  /data/warehouse/suite/catalyst/site/company=cde/year=2017/month=11
 /data/warehouse/suite/catalyst/site/company=cde/year=2016/month=11

If want to delete the partitions older than 2 years, That means /year=2017/month=11 and year=2016/month=11 need to be deleted How it can be done.

Pls help, Thanks in advance.

Girish HM
  • 41
  • 4
  • 1
    See this answer: https://stackoverflow.com/a/56646879/2700344 - almost exactly the same – leftjoin Dec 13 '19 at 10:46
  • Does this answer your question? [hive drop all partitions keep recent 4 days paritions](https://stackoverflow.com/questions/56645668/hive-drop-all-partitions-keep-recent-4-days-paritions) – leftjoin Dec 14 '19 at 11:48
  • @leftjoin Thank for your response..gone through your answer. It looks well and good. I have found different solution. – Girish HM Dec 18 '19 at 06:35

1 Answers1

0
ALTER TABLE mytable drop if exists partition (year<='2017')

You can not control the partition deletion as you are expecting

You can try it using the unix way that is more reliable.

 hive -S -e "show partitions test" > tmp.txt
curr_year=`expr "$(date +'%Y')" - "2"`
curr_mon=`expr "$(date +'%m')" - "1"`

cur_part=$curr_year$curr_mon

cur_part=201812
echo $cur_part


#echo "year=2016/month=12" | cut -d '=' -f 2 | grep -o -E '[0-9]+'
#echo "year=2016/month=12" | cut -d '=' -f 4 | grep -o -E '[0-9]+'

while read -r line
do
  part_year=`echo $line | cut -d '=' -f 2 | grep -o -E '[0-9]+'`
  part_mon=`echo $line | cut -d '=' -f 3 | grep -o -E '[0-9]+'`
  part_part=$part_year$part_mon
  echo $part_part

if [[ $part_part -lt $cur_part ]]
    then
    echo "$part_year , $part_mon"
    hive --hivevar year="$part_year" --hivevar month="$part_mon" -e 'ALTER TABLE test DROP IF EXISTS PARTITION (year="${hivevar:year}", month="${hivevar:month}")'

  fi


done < tmp.txt


    > show partitions test;
OK
year=2016/month=12
year=2017/month=11
year=2017/month=12
year=2018/month=12

> show partitions test;
OK
year=2017/month=12
year=2018/month=12

i have tested it is working fine

Strick
  • 1,512
  • 9
  • 15
  • Strick Thanks for your response, We need to consider months as well that means we need to keep year=2017 and month =12 partition if we run today. and here company also higher level partition coulmn – Girish HM Dec 13 '19 at 09:02
  • Thanks for the your valuable answer, It gave me one more option. Thanks! – Girish HM Dec 18 '19 at 06:37
  • @GirishHM If it helped you can you please upvote it and mark it as answer – Strick Dec 18 '19 at 07:15