1

I have a folder in gcs bucket with a folder structure as

Xyz/Abc/dt=03-12-2021/file_03-12-2021.csv
Xyz/Abc/dt=04-12-2021/file_04-12-2021.csv

I am trying to create a dynamic partitioned table on top of the folder by executing the below queries

Set hive.exec.dynamic.partition.mode=nonstrict
Set hive.exec.dynamic.partition=true


Create table tabName (sno int, city string, address string) partitioned by (dt string);

Create table tabStg (sno int, city string, address string, dt string) row format delimited fields terminated by ‘|’ stored as textfile location ‘gcs://Xyz/Abc’;

Insert overwrite table tabName partition(dt) select sno,city,address,dt from tabStg;

After executing the insert statement I get a message as

Loading data to table db.tabName (dt=null)

And if I query as show partitions tabName; I get all the partitions there. However if a new folder is created in the gcs bucket for a new date, the partitioned table is not able to identify that.

Any suggestions why this is happening. Am I missing anything

leftjoin
  • 36,950
  • 8
  • 57
  • 116

1 Answers1

0

There is no such thing in Hive as "dynamic partitioned table". Table can be partitioned or not partitioned. Dynamic partition mode - is how the data is being loaded during INSERT - single static partition can be specified, or partitions can be loaded dynamically from the select.

Example of dynamic partitions load:

Insert overwrite table tabName partition(dt) --partition is not specified 
select sno,city,address,dt from tabStg where...; --will be loaded dynamically

Example of static partition load:

Insert overwrite table tabName partition(dt='2021-03-01') --static partition specified
select sno,city,address from tabStg where dt='2021-03-01'; --take care about correct data loaded

In the last query static partition specified: partition(dt='2021-03-01'), note, that in this case partition column does not present in the select. You can load some other dates in the same partition if do not filter in WHERE.

The difference between two is that in case of dynamic load you can rewrite all existing partitions and load new ones, in case of static load you can load only partition specified in partition spec. If dynamic mode is not enabled, first query will fail, this is kind of protection against unintentional overwrite of partitions, nothing else.

If new partition folder appeared in the table folder, nothing happens automatically, you need to create partition in tabStg table if it was not done yet using MSCK REPAIR or ALTER TABLE, then repeat load into second table tabName using static load or dynamic load.

In addition to this, Hive is able to manage partitions automatically: add partition if new partition folder was loaded in the table folder in the filesystem by third tools. Read more: Manage partitions automatically - if this feature works in your Hive, you can do without adding partitions manually (no REPAIR table or ATER is necessary if you have this feature) in source table. But anyway, loading data into second table should be done as before.

leftjoin
  • 36,950
  • 8
  • 57
  • 116