0

I have sqoopd data from Netezza table and output file is in HDFS, but one column is a timestamp and I want to load it as a date column in my hive table. Using that column I want to create partition on date. How can i do that?

Example: in HDFS data is like = 2013-07-30 11:08:36

In hive I want to load only date (2013-07-30) not timestamps. I want to partition on that column DAILY.

How can I pass partition by column as dynamically?

I have tried with loading data into one table as source. In final table I will do insert overwrite table partition by (date_column=dynamic date) select * from table1

DanM7
  • 2,203
  • 3
  • 28
  • 46
kumar s
  • 1
  • 2

2 Answers2

0

Set these 2 properties -

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

And the Query can be like -

INSERT OVERWRITE TABLE TABLE PARTITION (DATE_STR)
SELECT 
  :
  :
 -- Partition  Col is the last column 
  to_date(date_column) DATE_STR
FROM table1;

You can explore the two options of hive-import - if it is an incremental import you will be able to get the current day's partition.

--hive-partition-key    
--hive-partition-value 
user 923227
  • 2,528
  • 4
  • 27
  • 46
  • Hi,Thanks for ur info.I have tried to_date(daye_column) ...partition col is the last column to_date(date_column) date_str from table1..but it not working.its giving parse exception to_date(col_name) – kumar s Jan 29 '15 at 19:24
  • instead of insert overwrite is there any way to write only new or updated columns..not to re-write all all the data again in table – kumar s Jan 29 '15 at 19:26
  • Possibly the dates are not well formed - check the task-tracker logs. Sqoop Has this option of incremental imports - http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_incremental_imports You can expplore the two options of hive-import - if it is an incremental import you will be able to get the current day's partition. --hive-partition-key --hive-partition-value – user 923227 Jan 29 '15 at 20:09
0

You can just load the EMP_HISTORY table from EMP by enabling dynamic partition and converting the timestamp to date using to_date date function

The code might look something like this....

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE EMP_HISTORY PARTITION (join_date)
    SELECT e.name as name, e.age as age, e.salay as salary, e.loc as loc, to_date(e.join_date) as join_date from EMP e ;
Anil
  • 420
  • 2
  • 16