16

I was researching partitions in Hive and came upon:

http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/ In this link, the author says: “When inserting data into a partition, it’s necessary to include the partition columns as the last columns in the query. Thecolumn names in the source query don’t need to match the partition column names, but they really do need to be last – there’s no way to wire up Hive differently”

I have a query like:

insert overwrite table MyDestTable PARTITION (partition_date)
select
grid.partition_date,
….

I have the above query that has been running for a while without errors. As you can see, I am selecting the partition column as the very first column. Is it wrong? I have tried to corroborate the author’s statement from other sources but am not finding other documents that say the same. Does anybody here know what the right thing to do is? From my end, being a Hive newbie, I am just going by whether Hive is complaining or not (which it is not).

KS

KS1234
  • 161
  • 1
  • 1
  • 4

3 Answers3

34

example:

set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict;  

drop table tmp.table1;

create table tmp.table1(  
col_a string,col_b int)  
partitioned by (ptdate string,ptchannel string)  
row format delimited  
fields terminated by '\t' ;  

insert overwrite table tmp.table1 partition(ptdate,ptchannel)  
select col_a,count(1) col_b,ptdate,ptchannel
from tmp.table2
group by ptdate,ptchannel,col_a ;
anyoneking
  • 536
  • 3
  • 4
10

Yes, It is mandatory to use partitioned column as last column while inserting the data. Make sure PARTITIONED BY column shouldn't be an existing column in the table.Hive will take care the rest.

CREATE EXTERNAL TABLE temp (
DATA_OWNER STRING,
DISTRICT_CODE STRING,
BILLING_ACCOUNT_NO STRING,
INST_COUNTY STRING,
INST_POST_CODE STRING,
INST_STATUS STRING,
INST_EXCHANGE_GROUP_CODE STRING,
EXCHANGE_CODE STRING
) PARTITIONED BY (TS_LAST_UPDATED STRING)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\001'
STORED AS TEXTFILE LOCATION 'user/entity/site/inbound/CSS_INSTALLATION_PARTITIONED';

INSERT OVERWRITE TABLE temp PARTITION (TS_LAST_UPDATED)
SELECT
DATA_OWNER,
DISTRICT_CODE,
BILLING_ACCOUNT_NO,
INST_COUNTY,
INST_POST_CODE,
INST_STATUS,
INST_EXCHANGE_GROUP_CODE,
EXCHANGE_CODE,TO_DATE(TS_LAST_UPDATED) FROM temp1 
Satheesh
  • 121
  • 2
  • 6
  • 1
    welcome to the stackoverflow community. To make your answers more readable, please go through editing help here http://stackoverflow.com/editing-help – sid-m Dec 23 '16 at 05:31
9

The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.

see hive wiki for more information.

pensz
  • 1,871
  • 1
  • 13
  • 18