1

I have created a hive table with dynamic partitioning on a column. Is there a way to directly load the data from files using "LOAD DATA" statement? Or do we have to only depend on creating a non-partitioned intermediate table and load file data to it and then inserting data from this intermediate table to partitioned table as mentioned in Hive loading in partitioned table?

Community
  • 1
  • 1

2 Answers2

0

No, the LOAD DATA command ONLY copies the files to the destination directory. It doesn't read the records of the input file, so it CANNOT do partitioning based on record values.

If your input data is already split into multiple files based on partitions, you could directly copy the files to table location in HDFS under their partition directory manually created by you (OR just point to their current location in case of EXTERNAL table) and use the following ALTER command to ADD the partition. This way you could skip the LOAD DATA statement altogether.

ALTER TABLE <table-name>
ADD PARTITION (<...>)
janeshs
  • 793
  • 2
  • 12
  • 26
  • In case of External table, after placing the files in the directory , can we run the msck command to create the paritions automatically ? – av abhishiek Feb 01 '18 at 10:05
-1

No other go, if we need to insert directly, we'll need to specify partitions manually.

For dynamic partitioning, we need staging table and then insert from there.

appleboy
  • 661
  • 1
  • 9
  • 15