2

I have a table with partitions like below :
TABLE logs PARTITION(year = 2019, month = 06, day = 18)

partitions 'year', 'month' and 'day' are in string format.

I need to drop partitions keeping last seven days partitions. and need to run the job every week so that, logs tables will have 7 days logs at the start of every week.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Deamon
  • 45
  • 4

1 Answers1

2

You can use <= operator in partition specification.

Demo:

use mydb;
drop table test_partition_drop;
CREATE TABLE test_partition_drop
(col1 STRING)
PARTITIONED BY (part_year string, part_month string, part_day string);
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day='09') VALUES ('01');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day='10') VALUES ('01');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day=11) VALUES ('02');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day=12) VALUES ('03');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day=13) VALUES ('05');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day=14) VALUES ('06');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='06', part_day=15) VALUES ('06');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2018', part_month='06', part_day=14) VALUES ('01');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2018', part_month='06', part_day=15) VALUES ('02');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='05', part_day=14) VALUES ('03');
INSERT INTO TABLE  test_partition_drop PARTITION (part_year='2019', part_month='04', part_day=15) VALUES ('04');

Calculate min partition keys to be dropped and pass to your DROP PARTITION script:

var_year="$(date -d "7 days ago" +"%Y")"
var_month="$(date -d "7 days ago" +"%m")"
var_day="$(date -d "7 days ago" +"%d")"

hive -e "
use mydb;
ALTER TABLE test_partition_drop DROP IF EXISTS PARTITION (part_year<'${var_year}');
ALTER TABLE test_partition_drop DROP IF EXISTS PARTITION (part_year='${var_year}', part_month<'${var_month}'); 
ALTER TABLE test_partition_drop DROP IF EXISTS PARTITION (part_year='${var_year}', part_month<='${var_month}', part_day<='${var_day}');
"

Result:

OK
Time taken: 0.762 seconds
Dropped the partition part_year=2018/part_month=06/part_day=14
Dropped the partition part_year=2018/part_month=06/part_day=15
OK
Time taken: 1.643 seconds
Dropped the partition part_year=2019/part_month=04/part_day=15
Dropped the partition part_year=2019/part_month=05/part_day=14
OK
Time taken: 1.0 seconds
Dropped the partition part_year=2019/part_month=06/part_day=09
Dropped the partition part_year=2019/part_month=06/part_day=10
Dropped the partition part_year=2019/part_month=06/part_day=11
Dropped the partition part_year=2019/part_month=06/part_day=12
Dropped the partition part_year=2019/part_month=06/part_day=13
Dropped the partition part_year=2019/part_month=06/part_day=14
OK
Time taken: 2.097 seconds
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • i am using this : beeline -u "jdbc_connection string" -n hive --hivevar var_year="$(date -d " days ago" +"%Y")" --hivevar var_month="$(date -d "7 days ago" +"%m")" --hivevar var_day="$(date -d "7 days ago" +"%d")" -e 'ALTER TABLE tablename DROP IF EXISTS PARTITION (part_year<= "${hivevar:var_year}" , mth<="${hivevar:part_month}" , dy<="${hivevar:part_day}");' but this one deleting keeping 7 days paritions in current month and deleting remaining all in current month, in previous months its keeping same partitions and deleting whichever partitions got deleted in current month. – Deamon Jun 21 '19 at 06:09
  • what i want like .. keep only current month last 7 days partitions and delete remaining each and every partition – Deamon Jun 21 '19 at 06:12
  • @Thilak Yes, it was a bug in the logic. Need to add more drop sentences to drop old years and old monht in curr year. Fixed. – leftjoin Jun 21 '19 at 07:13
  • Thank you so much. One more situation. what if we implement this at start of the month. like june 4th implementing. after running, i want june 1st,2nd,3rd,4th and may 31st,30th,29th only. remaining all needs to get drop. – Deamon Jun 21 '19 at 08:04
  • @Thilak Test it. var_month="$(date -d "7 days ago" +"%m")" will be set May, all before May will be deleted by first two sentences. And final drop will remove part_month<='05', part_day<='29'. – leftjoin Jun 21 '19 at 08:09
  • This one is elegant solution – vikrant rana Jun 24 '19 at 17:09