0

We are moving data inter cluster on a partition by partition basis and we have a requirement to use -update -skipcrccheck option only for this. In order to run distcp on a partition by partition basis with these options requires partition directory to be already created at the destination. In order to do that I need to perform -mkdir from a remote cluster on the destination cluster.

I tried to google for an answer but couldn't find anything. Is that something which is possible?

Kireet Bhat
  • 77
  • 1
  • 2
  • 11
  • Are you looking for solution where source directories will be created in target without explicitly creating directories in target hadoop cluster? question in title is not clear. – Ajay Kharade Apr 28 '20 at 19:52
  • that is correct Ajay_SK. My current distcp command is: hadoop distcp -f -m 1 -pbt -skipcrccheck -update – Kireet Bhat Apr 28 '20 at 20:45

1 Answers1

0

When DistCp is invoked without -update or -overwrite, the DistCp defaults would create directories first/ and second/, under /target.

distcp -skipcrccheck  hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second hdfs://nn2:8020/target

Output:

hdfs://nn2:8020/target/first/1
hdfs://nn2:8020/target/first/2
hdfs://nn2:8020/target/second/10
hdfs://nn2:8020/target/second/20
Ajay Kharade
  • 1,469
  • 1
  • 17
  • 31