0

I've created a hive table with base location pointing to AWS S3 location. However, I want to create a partition on HDFS cluster using 'Insert Overwrite' query.

Steps below:

-- Create intermediate table
create table test_int_ash
( loc string)
partitioned by (name string, age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location '/user/ash/test_int';

-- Insert into intermedate table with two names 'rash' and 'nash'
INSERT INTO test_int_ash partition (name="rash",age=20) values ('brisbane');
INSERT INTO test_int_ash partition (name="rash",age=30) values ('Sydney');
INSERT INTO test_int_ash partition (name="rash",age=40) values ('Melbourne');
INSERT INTO test_int_ash partition (name="rash",age=50) values ('Perth');

INSERT INTO test_int_ash partition (name="nash",age=50) values ('Auckland');
INSERT INTO test_int_ash partition (name="nash",age=40) values ('Wellington');


-- create curated table
create external table test_curated_ash
( loc string)
partitioned by (name string, age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location 's3a://mybucket/tmp/test_curated/'; 

-- load curated table from intermedate table, using dynamic partition method, creates partitions on aws s3.
insert overwrite table test_curated_ash partition(name='rash',age)
select loc,age from test_int_ash where name='rash' ;

-- I want to keep this partition on HDFS cluster, below query doesnt work 

insert overwrite table test_curated_ash partition(name='nash',age) location 'hdfs://mynamenode/user/ash/test_curated_new'
select loc,age from test_int_ash where name='nash';

Below query works, but I don't want to handle it with 'static partition' method.

alter table test_curated_ash add partition(name='nash',age=40) location 'hdfs://swmcdh1/user/contexti/ash/test_curated_new/name=nash/age=40';
alter table test_curated_ash add partition(name='nash',age=50) location 'hdfs://swmcdh1/user/contexti/ash/test_curated_new/name=nash/age=50';

insert overwrite table test_curated_ash partition(name='nash',age)
select loc,age from test_int_ash where name='nash'

Could you please help me with how to set partition location in a 'Insert Overwrite' dynamic query?

Ash
  • 1,180
  • 3
  • 22
  • 36

2 Answers2

0

Suppose i have a table named "user" and i want to partition it dynamically using country column.

Query :

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=1000;

INSERT overwrite TABLE partitioned_user
    PARTITION (country)
        SELECT  firstname ,lastname,address,city,salary ,post,phone1,phone2,email,
        web,country FROM user;

When inserting data into a partition, it’s necessary to include the partition columns as the last columns in the query.

set hive.exec.dynamic.partition.mode=nonstrict; if strict

In mapreduce strict mode (hive.mapred.mode=strict) , some risky queries are not allowed to run. They include:

  1. Cartesian Product.
  2. No partition being picked up for a query.
  3. Comparing bigints and strings.
  4. Comparing bigints and doubles.
  5. Orderby without limit.

According to point 2 and 5, we can not use SELECT statements without at least one partition key filter (like WHERE country=’US’) or ORDER BY clause without LIMIT condition on partitioned tables. But by default this property is set to nonstrict.

Gaurav Varshney
  • 502
  • 5
  • 15
  • I appreciate your answer, but I'm looking for setting partition location while doing 'insert overwrite' using dynamic partition method. Unfortunately, I'm unable to dictate partition location to be HDFS cluster. If you read my code, you will understand, what I'm trying to do. – Ash May 31 '18 at 04:17
0

You can create the data with Partitions on HDFS using another Intermediate table.

Then change the location of the partitions in the Final table to point to the different location by doing something like -

use dbname;ALTER TABLE table_name PARTITION (partname=value) SET LOCATION "location";

Or you can directly update the Hive Metastore table SDS for appropriate SD_ID

Saurav Bhowmick
  • 308
  • 4
  • 16
  • 1
    Hi Saurav, hope you read my question correctly. There are multiple sub partitions which are dynamic and I cannot hard code that ( In your example partname=value). If it were only one column partition, it was easy but, due to dynamic multiple sub partitions, I'm unable to do it. Anyways, I found an alternative and shortly will write a blog for the same. - Ta – Ash Sep 01 '18 at 08:04