1

I need to create an external hive table on top of a csv file. CSV is having col1, col2, col3 and col4.

But my external hive table should be partitioned on month but my csv file doesn't have any month field. col1 is date field. How can I do this?

leftjoin
  • 36,950
  • 8
  • 57
  • 116

2 Answers2

1

You need to reload data into partitioned table.

  1. Create non-partitioned table (mytable) on top of folder with CSV.
  2. Create partitioned table (mytable_part)

    create table mytable_part( --columns specification here for col1, col2, col3, col4 ) partitioned by (part_month string) ... stored as textfile --you can chose any format you need

  3. Load data into partitioned table using dynamic partitioning, calculate partition column in the query:

    set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;

    insert overwrite table mytable_part partition (part_month) select col1, col2, col3, col4, substr(col1, 1, 7) as part_month --partition column in yyyy-MM format from mytable distribute by substr(col1, 1, 7) --to reduce the number of files ;

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks but is there any way to do so without any intermediate table. –  May 11 '20 at 09:08
  • 1
    @ArpitaMishra Anyway you need to repartition data. You can use Spark to read files and write into partitioned table. But you cant create partitioned table on top of unpartitioned data and expect that hive will do re-partition it it for you. No. You need to re-partition data. Partition in Hive has it's own location with filers inside, and those files should contain only data which belongs to the partition. Intermediate table is not a big problem though – leftjoin May 11 '20 at 09:37
1

Try this way

Copy the csv data into a folder in HDFS location hdfs://somepath/5 and add that path to your external table as partition.

create external table ext1(
    col1   string
    ,col2  string
    ,col3  string
    ,col4  string
)
partition by (mm int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS ORC;

alter table ext1 add partition(mm = 5) location 'hdfs://yourpath/5';
Ganesh Chandrasekaran
  • 1,578
  • 12
  • 17