1

I want to check if a certain partition already exists before "insert overwrite" it. Only need to insert when that partition does not exist.

How to modify this query?

INSERT OVERWRITE TABLE myname.mytable PARTITION (ds='2019-07-19')
leftjoin
  • 36,950
  • 8
  • 57
  • 116
daydayup
  • 2,049
  • 5
  • 22
  • 47

2 Answers2

2

If you insert/overwrite your hive table using dynamically partitioned hive table, then it will overwrite only incase that partition is fetched in your select statement.

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

INSERT OVERWRITE TABLE db.PartitionTable PARTITION (native_country)
select
 age
,workclass
,education
,marital
,occupation
,race
,sex
,native_country
from db.sometable
where native_country='Vietnam';

the other way round is:

#!/bin/bash
table="db.PartitionTable"
partition_col="native_country=Vietnam"
partition_lookup="native_country='Vietnam'"
partition_exists=$(hive -e "show partitions $table" | grep "$partition_col");

echo $partition_exists

#check if partition_exists
 if [ "$partition_exists" = "" ];
 then echo "partition does not exists";
 else echo "partition exists"
 hive -e "select * from $table where ${partition_lookup}" > output.tmp;
 fi
vikrant rana
  • 4,509
  • 6
  • 32
  • 72
1

You can also achieve the same using dynamic partition insert and where partition column NOT in (select from myname.mytable). Something like this:

INSERT OVERWRITE TABLE myname.mytable PARTITION (ds)
select col1, col2 ...
       '2019-07-19' ds --partition column
  from ... 
where ds not in (select distinct ds from myname.mytable where ds='2019-07-19')

In case partition exists select will not return any rows for existing partition and hence will not overwrite the partition. You can also use NOT EXIST for the same.

leftjoin
  • 36,950
  • 8
  • 57
  • 116