2

I've created a Hive table with a partition like this:

CREATE TABLE IF NOT EXISTS my_table
(uid INT, num INT) PARTITIONED BY (dt DATE)

Then with PySpark, I'm having a dataframe and I've tried to write it to the Hive table like this:

df.write.format('hive').mode('append').partitionBy('dt').saveAsTable('my_table')

Running this I'm getting an exception:

Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

I then added this config:

hive.exec.dynamic.partition=true
hive.exec.dynamic.partition.mode=nonstrict

This time no exception but the table wasn't populated either!

Then I removed the above config and added this:

hive.exec.dynamic.partition=false

Also altered the code to be like:

df.write.format('hive').mode('append').partitionBy(dt='2022-04-29').saveAsTable('my_table')

This time I am getting:

Dynamic partition is disabled. Either enable it by setting hive.exec.dynamic.partition=true or specify partition column values

The Spark job I want to run is going to have daily data, so I guess what I want is the static partition, but how does it work?

Michael
  • 791
  • 2
  • 12
  • 32

2 Answers2

0

If you haven't predefined all the partitions you will need to use:

hive.exec.dynamic.partition=true
hive.exec.dynamic.partition.mode=nonstrict

Remember that hive is schema on read, and it won't automagically fix your data into partitions. You need to inform the meta-store of the paritions. You will need to do that manually with one of the two commands:

alter table <db_name>.<table_name> add partition(`date`='<date_value>') location '<hdfs_location_of the specific partition>';

or

MSCK REPAIR TABLE [tablename]
Matt Andruff
  • 4,974
  • 1
  • 5
  • 21
0

if the table is already created, and you are using append mode anyway, you can use insertInto instead of saveAsTable, and you don't even need .partitionBy('dt')

df.write.format('hive').mode('append').insertInto('my_table')
pltc
  • 5,836
  • 1
  • 13
  • 31