1

I have a partitioned table in Hive. And for each node I have set the partitions limit to 2000.

set hive.exec.max.dynamic.partitions.pernode=2000

Now that after 2000 is reached, I am facing a problem. So i am trying to understand if there a possibility to remove this limitation?

Simply I don't want to set any value for hive.exec.max.dynamic.partitions.pernode It should handle any number of partitions.

So could someone please help me on this?

user2531569
  • 609
  • 4
  • 18
  • 36

2 Answers2

3

I met this problem before. Add distribute by partition_column at the end of the SQL .

insert overwrite table table_A partition (date_id)
select xxxxxxxx
from table_B
distribute by date_id;

with "distribute by", same date_id value will be shuffled into one reducer. so a reducer may process several date_id instead of random date_id ( this may include all date_id ) .

Mike Gan
  • 339
  • 2
  • 8
  • 1
    How does `distribute by` connected with dynamic partitions limit? Distribute or not distribute the number of partitions dynamically created will be the same. Distribute by can help to reduce memory consumption. – leftjoin Sep 26 '16 at 08:13
  • distribute by partition_column can redistribute data and the data with same partition will be shuffled into one reducer . – Mike Gan Sep 26 '16 at 08:25
  • 1
    This will not change the number of partitions per node – leftjoin Sep 26 '16 at 08:36
  • it's too hard for me to explain this in english, may be you can write a sample SQL. – Mike Gan Sep 26 '16 at 08:50
  • If your data contain 15000 different partition keys, how many partitions will be created after load? With `distribute by` and without it? – leftjoin Sep 26 '16 at 09:17
  • 1
    both will create 15000 partitions but without distribute by, partition_A may created by 100 node, each node generate 1% data and each node will +1 partition in hive.exec.max.dynamic.partitions.pernode , with distribute by, all data in partition_A will be process by one node and this node only process a certain data volume. so , this will reduce partitions created by this node . – Mike Gan Sep 26 '16 at 09:43
  • Thanks for your explanation! Finally I have found the same in the documentation: https://cwiki.apache.org/confluence/display/Hive/Tutorial – leftjoin Sep 26 '16 at 11:08
  • @MikeGan : Can you add this explanation in the comment into the answer, so that others could easily read it in the answer at a go, than expanding the comments – K S Nidhin Sep 26 '16 at 19:29
  • yes, please edit your answer and I will undo my downwote – leftjoin Sep 29 '16 at 19:32
1

As far as I know, that can't be done, Hive enforces a limit in the number of dynamic partitions that it can create. Up to my knowledge, this limitation has to do with the fact that each partition is stored in a separate HDFS directory, so the number of them is somehow bounded in advance to prevent performance issues.

Jaime Caffarel
  • 2,401
  • 4
  • 30
  • 42