1

In Hive why I am not allowed to nest a static partition under a dynamic partition?

for eg the below is allowed

INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT key, value, /*ds,*/ hr FROM srcpart WHERE ds is not null and hr>10;

but this is not allowed

INSERT OVERWRITE TABLE T PARTITION (ds, hr = 11)
SELECT key, value, ds/*, hr*/ FROM srcpart WHERE ds is not null and hr=11;

I find the official wiki page explanation (shown below) insufficient. Prefer a logical explanation or an explanation at underlying map-reduce level.

SP is a subpartition of a DP: should throw an error because partition column order determins directory hierarchy. We cannot change the hierarchy in DML
Ani Menon
  • 27,209
  • 16
  • 105
  • 126
rogue-one
  • 11,259
  • 7
  • 53
  • 75
  • 1
    As an optimization, static partition will be created upfront rather during @ runtime. hence your (ds, hr = 11) will not work becoz parent directory is dynamic. – Nag Jun 22 '14 at 22:11

1 Answers1

0

That's a Hive design issue(specified here):

If there are multiple partitioning columns, their order is significant since that translates to the directory structure in HDFS: partitioned by (ds string, dept int) implies a directory structure of ds=2009-02-26/dept=2.

In a DML or DDL involving partitioned table, if a subset of partitioning columns are specified (static), we should throw an error if a dynamic partitioning column is lower.

Example:

create table nzhang_part(a string) partitioned by (ds string, dept int);
insert overwrite nzhang_part (dept=1)
  select a, ds, dept from T
  where dept=1 and ds is not null;
Ani Menon
  • 27,209
  • 16
  • 105
  • 126