Assuming an S3 location with required data is of the form:
s3://stack-overflow-example/v1/
where each file title in v1/
is of the form
francesco_{YYY_DD_MM_HH}_totti.csv
and each csv file contains a unix timestamp as a column in each row.
Is it possible to create an external hive table partitioned by the {YYY_DD_MM_HH} in each file name without first creating an unpartitioned table?
I have tried the below:
create external table so_test
(
a int,
b int,
unixtimestamp string
)
PARTITIONED BY (
from_unixtime(CAST(ord/1000 as BIGINT), 'yyyy-MM-dd') string
)
LOCATION 's3://stack-overflow-example/v1'
but this fails.
An option that should work is creating an unpartitioned table like the below:
create external table so_test
(
a int,
b int,
unixtimestamp string
);
LOCATION 's3://stack-overflow-example/v1'
and then dynamically inserting into a partitioned table:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
create external table so_test_partitioned
(
a int,
b int,
unixtimestamp string
)
PARTITIONED BY (
datep string
)
LOCATION 's3://stack-overflow-example/v1';
INSERT OVERWRITE TABLE so_test_partitioned PARTITION (date)
select
a,
b,
unixtimestamp,
from_unixtime(CAST(ord/1000 as BIGINT), 'yyyy-MM-dd') as datep,
from so_test;
Is creating an unpartitioned table first the only way?