2

I am facing issue while executing distcp command between two different hadoop clusters,

Caused by: java.io.IOException: Mismatch in length of source:hdfs://ip1/xxxxxxxxxx/xxxxx and target:hdfs://nameservice1/xxxxxx/.distcp.tmp.attempt_1483200922993_0056_m_000011_2

I tried using -pb and -skipcrccheck:

hadoop distcp -pb -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

hadoop distcp -pb  hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

hadoop distcp -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

but nothing seems to be working.

Any solutions please.

Ahmad F
  • 30,560
  • 17
  • 97
  • 143
Aditya
  • 21
  • 4
  • versions of the clusters are the same? I think there is an incompatibility between the versions of the two clusters. Basing on _Hadoop the Definitive Guide_ try: `If the two clusters are running incompatible versions of HDFS, then you can use the webhdfs protocol to distcp between them: % hadoop distcp webhdfs://namenode1:50070/foo webhdfs://namenode2:50070/foo` – Thomas8 Jan 09 '17 at 10:33
  • The versions(cdh5.8.2) are same ,event tried with webhdfs but stil the same error. – Aditya Jan 09 '17 at 16:51
  • Does it always fail for everything, or sometimes for specific files? And does it fail like this in 2 directions or just from cluster 1 to cluster 2? – Dennis Jaheruddin Jan 29 '17 at 15:42
  • It fails if there are any bad/corrupt files,yes it can fail in both the directions cluster 1 to cluster 2 and vice versa. – Aditya Jan 29 '17 at 15:56

3 Answers3

1

I was facing the same issue with distcp between two Hadoop clusters of exactly the same version. For me it turned out to be due to some files in one of the source directories being still open. Once I ran distcp for each source directory individually I was able to find that was the case - it worked fine for all but the one directory with the open files and only for those files. Of course it's hard to tell at first blush.

Edi Bice
  • 566
  • 6
  • 18
0

The issue was resolved by performing copyToLocal from cluster1 one to local linux fs and copyFromLocal to cluster2.

Aditya
  • 21
  • 4
-1
  1. Check source file stats, use command:

    hdfs fsck hdfs://xxxxxxxxxxx
    
  2. If the source file is not close, use this command to close it:

    hdfs debug recoverLease -path hdfs://xxxxxxx
    
  3. hadoop distcp -bandwidth 15 -m 50 -pb hdfs://xxxxxx hdfs://xxxxxx

Bùi Đức Khánh
  • 3,975
  • 6
  • 27
  • 43
tom lee
  • 1
  • 1