Questions tagged [distcp]

hadoop tool used for large inter- and intra-cluster copying.

The distcp command is a tool used for large inter- and intra- copying. It uses to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.

181 questions
0
votes
1 answer

Hadoop distcp not working

I am trying to copy data from one HDFS to another HDFS. Any suggestion why 1st one works but not 2nd one? (works) hadoop distcp hdfs://abc.net:8020/foo/bar webhdfs://def.net:14000/bar/foo (does not work ) hadoop distcp…
Rio mario
  • 283
  • 6
  • 18
0
votes
1 answer

How do I determine if a call to distcp2 was successful?

The best advice I could find online is that you should either compare the files after transfer or make a second run with -update, and the second is considered unreliable. Is there a way of determining if the call even returned without an exception?
Robert Rapplean
  • 672
  • 1
  • 9
  • 30
0
votes
2 answers

hadoop distcp not working,MR job in accepted state

I am trying to copy data from CDH4 to CDH5 cluster. When I submit the distcp job from CDH5, MR job goes to accepted state and stays there ( I have tried it multiple times, it stayed there for more than 15 hrs). Data I want to copy is less than 10MB.…
0
votes
1 answer

distcp2 in CDH5.2 with MR1

We have requirement to restrict mappers bandwidth when distcp from s3 to local cluster. So I downloaded hadoop-distcp-2.5.0-cdh5.2.0-20141009.063640-188.jar from https://repository.cloudera.com Here is the link :…
roy
  • 6,344
  • 24
  • 92
  • 174
0
votes
1 answer

Copying data from gateway node to a different cluster in same network

Is there a way to copy data from Gateway node in Cluster1 directly to HDFS of Cluster 2 when in same network.Currently I am doing scp to gateway node of Cluster 2 and uploading data to HDFS. Thanks,
darkknight444
  • 546
  • 8
  • 21
0
votes
1 answer

Hadoop distcp command using a different S3 destination

I am using a Eucalyptus private cloud on which I have set up an CDH5 HDFS. I would like to backup my HDFS to the Eucalyptus S3. The classic way to use distcp as suggested here: http://wiki.apache.org/hadoop/AmazonS3 , ie hadoop distp…
Geeky
  • 35
  • 5
0
votes
2 answers

Multiple source files for s3distcp

Is there a way to copy a list of files from S3 to hdfs instead of complete folder using s3distcp? this is when srcPattern can not work. I have multiple files on a s3 folder all having different names. I want to copy only specific files to a hdfs…
its me
  • 127
  • 2
  • 8
0
votes
0 answers

Copying data between 2 different hadoop clusters

I am trying to copy data from one HDFS directory to another using distcp: Source hadoop version: hadoop version Hadoop 2.0.0-cdh4.3.1 Destination hadoop version: hadoop version Hadoop 2.0.0-cdh4.4.0 Command I am using is: hadoop distcp…
Rio
  • 765
  • 3
  • 17
  • 37
0
votes
1 answer

How to copy from subdirectories using s3DistCp

Trying to use s3DistCp to copy from s3://my-bucket/dir1/ , s3://my-bucket/dir2, s3://my-bucket/dir3. And all three DIRs has some files in them. Wanted to do something like: hadoop jar s3distcp.jar --src s3://my-bucket/*/ --dest…
yunt
  • 1
  • 1
0
votes
1 answer

Import data from inter cluster hadoop with different versions using command line

Can you tell me the exact command to import data from hdfs with two different haddop version one with hadoop 2.0.4 alpha and other 2.4.0 version? How can I use distcp command in this case?
0
votes
1 answer

How Block size varies from Cluster1 to Cluster2, if we use DistCp command?

I am processing "DistCp" command to move few critical files form My Cluster1 to Cluster2. These critical files were residing with Blocksize 64MB, before. And now moved to Cluster2 [it got 128MB blocksize). After the DistCp move, how does the does…
-1
votes
1 answer

Client cannot authenticate via: [TOKEN, KERBEROS)

From my spark application I am trying to distcp from hdfs to s3. My app does some processing on data and writes data to hdfs and that data I am trying to push to s3 via distcp. I am facing below error. Any pointer will be…
Mukesh Kumar
  • 317
  • 1
  • 5
  • 16
-1
votes
1 answer

Monitor and verify long distcp operation

Are the any other possibilities to monitor and verify large hadoop distcp, cluster to cluster, hdfs copy jobs other than examining the yarn/mapreduce logs ? (millions of small and large files, runtime estimated: couple of days, changing network…
matz3
  • 88
  • 8
-1
votes
1 answer

how to move hdfs files as ORC files in S3 using distcp?

I have a requirement to move text files in hdfs to aws s3. The files in HDFS are text files and non-partitioned.The output of the S3 files after migration should be in orc and partitioned on specific column. Finally a hive table is created on top of…
nagendra
  • 1,885
  • 3
  • 17
  • 27
-1
votes
2 answers

Failed to copy file from FTP to HDFS

I have FTP server (F [ftp]), linux box(S [standalone]) and hadoop cluster (C [cluster]). The current files flow is F->S->C. I am trying to improve performance by skipping S. The current flow is: wget…
Denis
  • 1,130
  • 3
  • 17
  • 32
1 2 3
12
13