3

I am doing distcp from one hadoop cluster(version 0.20.2) to another hadoop cluster(version 2.2.0) using below command.

hadoop distcp -update -skipcrccheck
  "hftp://x.x.x.x:50070//hive/warehouse//staging_eventlog_arpu_comma" 
  "hdfs://y.y.y.y:9000//hive/warehouse/staging_eventlog_arpu_comma"

so bandwidth utilization should be source to destination. but network utilization is more from destination to source compared to source to destination.

hadoop distcp -bandwidth specifies :

  Each map will be restricted to consume only the specified bandwidth. 
  This is not always exact. 
  The map throttles back its bandwidth consumption during a copy, 
   such that the net bandwidth used tends towards the specified value.

So what does it throttle back.?

axnet
  • 5,146
  • 3
  • 25
  • 45
user2950086
  • 135
  • 1
  • 1
  • 13

0 Answers0