1

I am currently working with both CDH and HDP. My CDH system's replication process works very well but HDP doesn't

For example:

  • When I set the replication factor for large directory in HDFS (20TB) to 2, HDFS need to delete 2 millions blocks

  • When I set again the replication factor for above directory to 3, HDFS will need to resolve about 2 millions under replicated blocks

In CDH, I only took 3-5 hours or less to complete but HDP took me about 2 days.

I want to improve the speed for replication process in HDP.

I have searched around and found that changing some replication configs for HDFS maybe help. I also found that HDP miss these configs compare to CDH:

  • dfs.namenode.replication.max-streams (default: 2, CDH: 20)
  • dfs.namenode.replication.max-streams-hard-limit (default: 4, CDH: 40)
  • dfs.namenode.replication.work.multiplier.per.iteration (default: 2, CDH: 30)

I changed above configs in HDP similar to CDH but no luck. Hope someone can help!

Minh Ha Pham
  • 2,566
  • 2
  • 28
  • 43

0 Answers0