0

I want to delete all Kudu RANGE partitions from the kudu table which has partition value less than a given date string. I am using following query but it's not working. Can someone please suggest what is the workaround.

alter table test_table drop if exists range partition values < '2010-01-31';

My Impala version is 2.6x and it appears to not work with '<' comparison. I can't use '=' because as it's going to be done dynamically and i need one single query to wipe off all empty kudu partitions before the passed date string.

Edit 1: Work-around I implemented. So, I had to write the script to fetch unique partition values and dynamically create the SQL which would give the same output after execution.

 drop_partition_query=''
 read -d '' drop_query <<EOF
select distinct partition_date from test_table where partition_date < '2021-01-01' group by partition_date
EOF
partitions_to_drop=$(${IMPALA_SHELL} -B -q "${drop_query}" --quiet "--output_delimiter=\n" 2>/dev/null)
partitions_array=( $partitions_to_drop )
for (( p=0; p< ${#partitions_array[@]}; p++ ))
do
   drop_partition_query+="ALTER TABLE test_table DROP RANGE PARTITION VALUE='${partitions_array[$p]}';"
done

# drop_partition_query is the sql that should be executed to drop the partitions.
Akanksha_p
  • 916
  • 12
  • 20

1 Answers1

1

I think that on your version you cannot use such syntax, looks like this feature was added in Impala 2.8

Docu

To drop or alter multiple partitions:

In Impala 2.8 and higher, the expression for the partition clause with a DROP or SET operation can include comparison operators such as <, IN, or BETWEEN, and Boolean operators such as AND and OR.

For example, you might drop a group of partitions corresponding to a particular date range after the data "ages out":

alter table historical_data drop partition (year < 1995); alter table historical_data drop partition (year = 1996 and month between 1 and 6);

For tables with multiple partition keys columns, you can specify multiple conditions separated by commas, and the operation only applies to the partitions that match all the conditions (similar to using an AND clause):

alter table historical_data drop partition (year < 1995, last_name like 'A%');

This technique can also be used to change the file format of groups of partitions, as part of an ETL pipeline that periodically consolidates and rewrites the underlying data files in a different file format:

alter table fast_growing_data partition (year = 2016, month in (10,11,12)) set fileformat parquet;

Here is ticket in which it was added if you want to take a look: Jira issue

Not sure how to handle it, maybe you can write some script/spark shell code which will list all partitions and choose only does one which you want and concat them into one query which your Impala can handle

M_S
  • 2,863
  • 2
  • 2
  • 17
  • Thanks for answering. Yes, i have already written a script to execute set of add partition statements for each partition in one batch. Will update the question with the current work around used. – Akanksha_p Dec 28 '22 at 14:12