1

I already have a Hive partitioned table. I needed to add a new column to the table, so i used ALTER to add the column like below.

ALTER TABLE TABLE1 ADD COLUMNS(COLUMN6 STRING);

I have my final table load query like this:

INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column4(Partition Column),Field Name:Code Sample value - YAHOO.COM
stg.Column5(Partition Column),Field Name:Date Sample Value - 2021-06-25
stg.Column6(New Column)       Field Name:reason     sample value - Adjustment
from (
         select fee.* from (
             select 
               fees.* , 
               ROW_NUMBER() OVER (PARTITION BY fees.Column1 ORDER BY fees.Column3 DESC) as RNK
             from Stage table fee
         ) fee
         where RNK = 1
     ) stg
     left join (
         select Column1 from Final table
         where Column5(date) in (select distinct column5(date) from Stage table)
     ) TGT
     on tgt.Column1(id) = stg.Column1(id) where tgt.column1 is null 
UNION
select 
tgt.column1(id),
tgt.column2,
tgt.column3,
tgt.column4(partiton column),
tgt.column5(partiton column-date),
tgt.column6(New column)
from 
Final Table TGT
      WHERE TGT.Column5(date) in (select distinct column5(date) from Stage table);"

Now when my job ran today, and when i try to query the final table, i get the below error

Invalid partition value 'Adjustment' for DATE partition key: Code=2021-06-25/date=Adjustment

I can figure out something wrong happend around the partition column but unable to figure out what went wrong..Can someone help?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
jahan
  • 103
  • 4
  • 19

1 Answers1

1

Partition columns should be the last ones in the select. When you add new column it is being added as the last non-partition column, partition columns remain the last ones, they are not stored in the datafiles, only metadata contains information about partitions. All other columns order also matters, it should match table DDL, check it using DESCRIBE FORMATTED table_name.

INSERT OVERWRITE table Final table  PARTITION(COLUMN4, COLUMN5)
select
stg.Column1,
stg.Column2,
stg.Column3,
stg.Column6 (New column) ------------New column
stg.Column4(Partition Column)  ---partition columns
stg.Column5(Partition Column)
...
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • @leftjoin Thank you for the response. what should i do now so that i dont lose the data in the final table and correct this issue? – jahan Jun 27 '21 at 15:02
  • @jahan Just drop wrong partitions you created previously, like this: Code=2021-06-25/date=Adjustment. Your data should be fine except those partitions you additionally created during insert overwrite. Try ALTER TABLE DROP PARTITION (Code='2021-06-25',date='Adjustment'). Alternatively remofe those partition folders from HDFS manually, then run MSCK REPAIR – leftjoin Jun 27 '21 at 17:00
  • @leftjoin even if i drop the wrong partitions, the older data might also got corrupted because its an insert overwrite final table..i may be wrong, but please correct me. I'm now worried if i lose the older data – jahan Jun 28 '21 at 02:14
  • @jahan If it was as you described, then your insert created additional partitions and original data shold be as it were before. insert overwrite affects only those partitions which exist in the dataset returned by select, it does not delete partitions which not returned by select. But it is always good Idea to make full copy of table location if you are not sure what will happen next. Then proceed w next step Drop partition most probably will not work because of datatype. Check what is in HDFS, remove wrong partitions locations manually, make table External by setting property, drop/create. – leftjoin Jun 28 '21 at 02:32
  • @leftjoin makes some sense now, i just checked my hdfs location, and it seems until last thursday, i have the right partition, and after the change on friday, 3 additional partitions created for 3 job runs each day. So after taking a full location backup, can i remove the newly added partition file from the location? and if i run a msck repair on the table, i should be able to again see the o/p from the table like before, but only till last thursday's data. correct? – jahan Jun 28 '21 at 03:22
  • Yes , proceed with removing wrong partitions folders, only with wrong date – leftjoin Jun 28 '21 at 09:05