1

I have hive table like below

create external table transaction(
    id int,
    name varchar(60))
    month string
  )
  PARTITIONED BY (
  year string, 
  transaction_type_code varchar(20)
  )
  STORED AS PARQUET 
  LOCATION 'hdfs://xyz';

I am creating one more external table with one more partition column dt like below

create external table transaction_copy(
    id int,
    name varchar(60))
    month string
  )
  PARTITIONED BY (
  dt string,
  year string, 
  transaction_type_code varchar(20)
  )
  STORED AS PARQUET 
  LOCATION 'hdfs://xyz';

Adding partition like below

alter table transaction_copy add if not exists partition (dt='20210811') LOCATION 'hdfs://xyz';

Getting below exception

ERROR: Error while compiling statement: FAILED: ValidationFailureSemanticException partition spec {dt=20210810} doesn't contain all (3) partition columns

I am able to add partition by passing all 3 partitions .

Is it also possible by passing only one partition?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Manoj Kumar Dhakad
  • 1,862
  • 1
  • 12
  • 26

1 Answers1

1

Not possible to add partition and specify only one column out of three because partitions in Hive are hierarchical, on hdfs partitions are hierarchical folders:

table location
----------------------
   |                |     ...
  dt1              dt2    ...
______________...  __________
  |        |        |       |
  year=1  year=2    year3  ....
   ...     |
          -------------------------
          |                     |
  transaction_type_code = 1   transaction_type_code = 2

Each partition location path looks like this
hdfs://blabla-my-dwh/table_name/dt=1/year=1/transaction_type_code=1

Data files normally are located in the leaf path folders, though, you can specify some custom location in hdfs which is not in this hierarchy, but the metadata for patition should contain all columns because without all columns it is not possible identify the partition at all.

BTW if dt is a date and year can be derived from the same dt, then such partitioning makes no sense because each date will contain only one year inside.

Also if you are creating external table with three partition columns, make sure you have data already organized in hierarchical folders, unless you are going to specify some custom location for each partition manually. CREATE EXTERNAL TABLE will not re-organize data for you, it will not work if there are no data folders for each partition.

If you have existing partitioned table and want to repartition using different partitioned schema, then you need to reload data:

  1. Create new table with new partitioning
  2. Load data
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    
    insert overwrite table my_new_table partition (day, month, year) 
    select col1, col2 ...,
           day, month, year  --partition columns shold be the last
     from my_old_table;
  1. Drop old table (remove folder as well if you do not need it), rename new table if necessary
leftjoin
  • 36,950
  • 8
  • 57
  • 116