6

I have created a table in Hive with the following query:

create table if not exists employee(CASE_NUMBER String,
                                         CASE_STATUS String,
                                         CASE_RECEIVED_DATE DATE,
                                         DECISION_DATE  DATE,
                                         EMPLOYER_NAME STRING,
                                         PREVAILING_WAGE_PER_YEAR BIGINT,
                                         PAID_WAGE_PER_YEAR BIGINT,
                                         order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';
                                         

I tried loading data into the create table using below query:

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee  partition (JOB_TITLE_SUBGROUP);

For the partitioned table, I have even set following configuration :

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

But I am getting below error while executing the load query:

Your query has the following error(s):

Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])

Please help.

Community
  • 1
  • 1
Mohit Sudhera
  • 341
  • 1
  • 4
  • 16

2 Answers2

3

If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");

Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.

Check the documentation for details on the LOAD DATA syntax.

EDITED

Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:

INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42
  • 1
    Thanks for your answer Jaime, but I am experimenting with dynamic partitioning. The scenario you have mentioned is for static partitioning. Static partitioning is working fine but I am facing a problem with dynamic partitioning. – Mohit Sudhera Aug 07 '16 at 17:56
  • 1
    In that case, I think that you have to use an `INSERT` clause instead of a `LOAD DATA`. The loading command just copies files to the destination directory, it doesn't read the records of the file. – Jaime Caffarel Aug 07 '16 at 18:03
  • I think this would be useful: http://stackoverflow.com/questions/13148187/hive-loading-in-partitioned-table – Jaime Caffarel Aug 07 '16 at 18:04
  • Thanks Jaime, it Worked :) – Mohit Sudhera Aug 08 '16 at 09:50
2

I might be little late to reply but can try below steps:

  1. Set below properties first :

    Ø set hive.exec.dynamic.partition.mode=nonstrict;
    Ø set hive.exec.dynamic.partition=true;
    
  2. Create temp table first:

    CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
    ID STRING,
    Name STRING,
    Salary STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    tblproperties ("skip.header.line.count"="1");
    
  3. Load Data in temporary table:

    hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee; 
    
  4. Create Partitioned Table:

    CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
    ID STRING,
    Name STRING)
    PARTITIONED BY (Salary STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    tblproperties ("skip.header.line.count"="1");
    
  5. Load Data into partitioned table from intermediate / temp table:

    INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;
    
Milo
  • 3,365
  • 9
  • 30
  • 44