1

I have 2 types of value in the partition column of string datatype:

  1. yyyyMMdd
  2. yyyy-MM-dd

E.g. there are partition column values 20200301, 2020-03-05, 2020-05-07, 20200701, etc.

I need to drop partitions less than 20200501 with a DDL statement like

alter table tblnm drop partition(partcol < 20200501);

When I drop partitions using yyyy-MM-dd format, only 2020-03-05 partition is dropped. But when I drop partitions using yyyyMMdd format, 20200301 is dropped as well as all the partitions containing hyphen (-).

How to compare value by ignoring hyphen or by ignoring data that contains hyphen? I can replace hyphen in the alter table query if necessary.

GoodDok
  • 1,770
  • 13
  • 28

1 Answers1

0

The reason why all the partitions with hyphen get dropped is the comparison of strings in java: every 2020-XX-XX string is less than every 2020XXXX string.

To restrict partitions to those without hyphen you should add a lower bound to the DDL in the format yyyy0101:

alter table tblnm drop partition(partcol < 20200501, partcol >= 20200101);

Beware that the partitions for the previous years won't be dropped, but you're free to run something like

alter table tblnm drop partition(partcol <= 20191231, partcol >= 20190101);

whenever you need.

GoodDok
  • 1,770
  • 13
  • 28
  • Thanks for the reply. This is helpful. But my scenario needs past n of years also. Anyways thank u very much – Keerthana Somu Jul 20 '20 at 12:35
  • 1 more doubt.. In impala i can able to drop partitions with regexp. After dropping is therr any way to reflect that in hive/hdfs – Keerthana Somu Jul 20 '20 at 12:37
  • I'd probably look into [MSCK REPAIR TABLE](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)) command – GoodDok Jul 20 '20 at 22:32