0

I am trying to copy data from one HDFS to another HDFS. Any suggestion why 1st one works but not 2nd one?

(works)

hadoop distcp hdfs://abc.net:8020/foo/bar webhdfs://def.net:14000/bar/foo

(does not work )

hadoop distcp webhdfs://abc.net:50070/foo/bar webhdfs://def:14000/bar/foo

Thanks!

Rio mario
  • 283
  • 6
  • 18
  • Please share the error log – Sandeep Singh Jun 12 '15 at 17:28
  • do you think both ways are correct? Source and destination are in different versions of Hadoop – Rio mario Jun 12 '15 at 17:48
  • if source has lower version of MR and destination has higher version of MR then, there should be issue. to overcome this you should use `webhdfs`. Your both ways seems ok, but in your second command destination namenode port should be `50070`. Can you cross check if its running on right port. You can verify it by accessing it through web browser. – Sandeep Singh Jun 12 '15 at 18:01
  • WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE. Hence NameNode HTTP port should be given there. – Sandeep Singh Jun 12 '15 at 18:07

1 Answers1

1

If the two cluster are running incompatible version of HDFS, then you can use the webhdfsprotocol to distcp between them.

hadoop distcp webhdfs://namenode1:50070/source/dir webhdfs://namenode2:50070/destination/dir

NameNode URI and NameNode HTTP port should be provided in the source and destination command, if you are using webhdfs.

Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68