Hive Partition By dynamic value in s3 file name

Question

Assuming an S3 location with required data is of the form:

s3://stack-overflow-example/v1/

where each file title in v1/ is of the form

francesco_{YYY_DD_MM_HH}_totti.csv

and each csv file contains a unix timestamp as a column in each row.

Is it possible to create an external hive table partitioned by the {YYY_DD_MM_HH} in each file name without first creating an unpartitioned table?

I have tried the below:

create external table so_test 
(
a int,
b int,
unixtimestamp string
)
PARTITIONED BY (
 from_unixtime(CAST(ord/1000 as BIGINT), 'yyyy-MM-dd') string
)
LOCATION 's3://stack-overflow-example/v1'

but this fails.

An option that should work is creating an unpartitioned table like the below:

create external table so_test 
(
a int,
b int,
unixtimestamp string
);
LOCATION 's3://stack-overflow-example/v1'

and then dynamically inserting into a partitioned table:

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

create external table so_test_partitioned 
(
a int,
b int,
unixtimestamp string
)
PARTITIONED BY (
 datep string
)
LOCATION 's3://stack-overflow-example/v1';



INSERT OVERWRITE TABLE so_test_partitioned PARTITION (date)
select 
a,
b,
unixtimestamp,
from_unixtime(CAST(ord/1000 as BIGINT), 'yyyy-MM-dd') as datep,
from so_test;

Is creating an unpartitioned table first the only way?

Hive Partition By dynamic value in s3 file name

0 Answers0