16

I have a log file in HDFS, values are delimited by comma. For example:

2012-10-11 12:00,opened_browser,userid111,deviceid222

Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require definition of partitions in the script, but I want partitions to set up automatically from HDFS file.

The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.

Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
Valery Yesypenko
  • 412
  • 3
  • 6
  • 14

5 Answers5

16

Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.

The quick context is that:

  1. Load data simply copies data, it doesn't read it so it cannot figure out what to partition
  2. Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table
Denny Lee
  • 3,154
  • 1
  • 20
  • 33
  • Thanks, Denny. That's how I've solved this problem - using External table that points to raw file and then the data from it is inserted into partitioned table with selected partitions. This is the only simple solution of such case. – Valery Yesypenko Nov 05 '12 at 12:30
  • Yes, it's a magic :) I've solved it at the same time when you added your comment :) Maybe you can help me with another issue? http://stackoverflow.com/questions/13240409/hadoop-eof-exception-after-map-step – Valery Yesypenko Nov 05 '12 at 21:09
5
  1. As mentioned in @Denny Lee's answer, we need to involve a staging table(invites_stg) managed or external and then INSERT from staging table to partitioned table(invites in this case).

  2. Make sure we have these two properties set to:

    SET hive.exec.dynamic.partition=true;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    
  3. And finally insert to invites,

    INSERT OVERWRITE TABLE India PARTITION (STATE) SELECT COL's FROM invites_stg;
    

Refer this link for help: http://www.edupristine.com/blog/hive-partitions-example

bekce
  • 3,782
  • 29
  • 30
appleboy
  • 661
  • 1
  • 9
  • 15
2

I worked this very same scenario, but instead, what we did is create separate HDFS data files for each partition you need to load.

Since our data is coming from a MapReduce job, we used MultipleOutputs in our Reducer class to multiplex the data into their corresponding partition file. Afterwards, it is just a matter of building the script using the Partition from the HDFS file name.

Mauricio Morales
  • 988
  • 1
  • 9
  • 16
2

How about

LOAD DATA INPATH '/path/to/HDFS/dir/file.csv' OVERWRITE INTO TABLE DB.EXAMPLE_TABLE PARTITION (PARTITION_COL_NAME='PARTITION_VALUE');

user2720864
  • 8,015
  • 5
  • 48
  • 60
  • 1
    This worked for me. Upvoted! Not sure why it was down voted – Kumar Aug 26 '19 at 14:25
  • Maybe more explanation? I upvoted because this is the syntax I was trying to figure out. It's basically what I got from Denny's link. – ldmtwo Sep 19 '19 at 15:58
-1
CREATE TABLE India (

OFFICE_NAME STRING,

OFFICE_STATUS     STRING,

PINCODE           INT,

TELEPHONE   BIGINT,

TALUK       STRING,

DISTRICT    STRING,

POSTAL_DIVISION   STRING,

POSTAL_REGION     STRING,

POSTAL_CIRCLE     STRING

)

PARTITIONED BY (STATE   STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

5. Instruct hive to dynamically load partitions

SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
  • Welcome to SO. Could you format your answer as code, e.g. by adding 4 blanks in front of every code line, please? Please add also some more information instead of a link. Thanks. – CKE Sep 12 '18 at 07:40
  • Please do not link just to your website, see the [help center](/help/promotion). – Glorfindel Sep 12 '18 at 07:42