I am doing distcp from one hadoop cluster(version 0.20.2) to another hadoop cluster(version 2.2.0) using below command.
hadoop distcp -update -skipcrccheck
"hftp://x.x.x.x:50070//hive/warehouse//staging_eventlog_arpu_comma"
"hdfs://y.y.y.y:9000//hive/warehouse/staging_eventlog_arpu_comma"
so bandwidth utilization should be source to destination. but network utilization is more from destination to source compared to source to destination.
hadoop distcp -bandwidth specifies :
Each map will be restricted to consume only the specified bandwidth.
This is not always exact.
The map throttles back its bandwidth consumption during a copy,
such that the net bandwidth used tends towards the specified value.
So what does it throttle back.?