2

I am storing data in Hadoop according to the date they were created, so I have multiple directories on HDFS that are created based on the format /data/{year}/{month}/{day}.
I wish to load these data in Hive (periodically) and create the corresponding partitions. For the moment I am experimenting with several approaches like the one below

CREATE EXTERNAL TABLE tablename (...)
PARTITIONED BY (year STRING, month STRING, day STRING)
LOCATION '/data';

ALTER TABLE tablename ADD PARTITION(year='2014', month='10', day='13') LOCATION '/data/2014/10/13';

but with this approach I need to manually create each partition with the ALTER command. Is there any way to automate and parameterize this process (and put it in a workflow) to dynamically load data into Hive partitions for each one of the sub-directories ?

ypanag
  • 287
  • 5
  • 22
  • Check out if the solution here helps... http://stackoverflow.com/questions/13148187/hive-loading-in-partitioned-table – Arun A K Oct 13 '14 at 22:30

2 Answers2

1

How about a msck repair table ? This should automatically load all the partitions in the subdirectory. See here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

Amar
  • 3,825
  • 1
  • 23
  • 26
  • 1
    Thanks that was exactly what I needed, the only thing I needed to do was to rename the input directories to work (format /partitionkey=partitionvalue/... ) – ypanag Oct 14 '14 at 10:36
  • I don't think any command can do that automatically. This you might have to rename while inserting the data itself. – Amar Oct 14 '14 at 10:44
  • Yes of course I just did that and it worked. Just wrote that for completeness if anyone has the same issues – ypanag Oct 14 '14 at 10:52
  • @ypanag can you please share the code snippet – Anonymous Jun 12 '21 at 07:03
0

In order to create a dynamic partition in your case , you need to do the following while defining the partition columns

  1. For date - Define the partition as day(string date).
  2. For month - Define the partition as month(string date)
  3. For year - Define as year(string date).

Hope you understood the concept. Hope this helps you

K S Nidhin
  • 2,622
  • 2
  • 22
  • 44
  • Thing is I am able to create the partitions, my problem is that I have to do it everyday manually (with process described above). Ideally I would want to somehow create some kind of automation (e.g. a script) to load my new data into Hive everyday – ypanag Oct 13 '14 at 14:17