61

-put and -copyFromLocal are documented as identical, while most examples use the verbose variant -copyFromLocal. Why?

Same thing for -get and -copyToLocal

snappy
  • 2,761
  • 5
  • 23
  • 24
  • please check this link which mentioned the details in source code level http://hakunamapdata.com/why-put-is-better-than-copyfromlocal-when-coping-files-to-hdfs/ – Jagadish Talluri Oct 01 '15 at 12:18

6 Answers6

68

-copyFromLocal is similar to -put command, except that the source is restricted to a local file reference.

So basically, you can do with put, all that you do with -copyFromLocal, but not vice-versa.

Similarly,

-copyToLocal is similar to get command, except that the destination is restricted to a local file reference.

Hence, you can use get instead of -copyToLocal, but not the other way round.

Reference: Hadoop's documentation.

Update: For the latest as of Oct 2015, please see this answer below.

Ozair Kafray
  • 13,351
  • 8
  • 59
  • 84
41

Let's make an example: If your HDFS contains the path: /tmp/dir/abc.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://. Maybe it picks the path you did not want to copy.

Therefore you have -copyFromLocal which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.

Put is for more advanced users who know which scheme to put in front.

It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
  • 1
    What do you mean by "the hdfs API won't know which one you mean"? For '-put' the source is always the first argument. Or you mean that some users may confuse '-put' with '-get' ? – snappy Oct 18 '11 at 17:52
  • No, neither way. We are speaking about two different file systems here. HDFS and local file system (say ext4). By using `bin/hadoop fs -put /tmp/somepath /user/hadoop/somepath` the command actually does not know whether `/tmp/somepath` exists in both filesystems, or just in local filesystem. Same thing with the destination path. – Thomas Jungblut Oct 18 '11 at 17:58
  • 8
    So the first parameter is not always an local fs path so to say. You can `put` from one HDFS to another if you'd like. `-copyFromLocal` will ensure that it just picks from the local disk and uploads to HDFS. – Thomas Jungblut Oct 18 '11 at 17:58
  • Why does it need to know? Your command example (and the -copyFromLocal variant) always copies /tmp/somepath/* from local to /user/hadoop/somepath/* on HDFS, and creates /user/hadoop/somepath directories if they are not yet created. Right? – snappy Oct 18 '11 at 18:08
  • No, put would prefer the HDFS scheme instead of the local file system. copyFromLocal would not do this and pick it from local file system. – Thomas Jungblut Oct 19 '11 at 08:06
  • Great answer, thanks for explaining why you would ever need or want to use -copyFromLocal – James Allen Jul 01 '15 at 16:42
21

Despite what is claimed by the documentation, as of now (Oct. 2015), both -copyFromLocal and -put are the same.

From the online help:

[cloudera@quickstart ~]$ hdfs dfs -help copyFromLocal 
-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> :
  Identical to the -put command.

And this is confirmed by looking at the sources, where you can see that the CopyFromLocal class extends the Put class, but without adding any new behavior:

  public static class CopyFromLocal extends Put {
    public static final String NAME = "copyFromLocal";
    public static final String USAGE = Put.USAGE;
    public static final String DESCRIPTION = "Identical to the -put command.";
  }

  public static class CopyToLocal extends Get {
    public static final String NAME = "copyToLocal";
    public static final String USAGE = Get.USAGE;
    public static final String DESCRIPTION = "Identical to the -get command.";
  }

As you might notice it, this is exactly the same for get/copyToLocal.

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
4
  • both are the same except
  • -copyFromLocal is restricted to copy from local while -put can take file from any (other HDFS/local filesystem/..)
Manish Agrawal
  • 794
  • 1
  • 9
  • 23
1

They're the same. This can be seen by printing usage for hdfs (or hadoop) on a command-line:

$ hadoop fs -help
# Usage: hadoop fs [generic options]
# [ . . . ]
# -copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst> :
#   Identical to the -put command.

# -copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
#   Identical to the -get command.

Same for hdfs (the hadoop command specific for HDFS filesystems):

$ hdfs dfs -help
# [ . . . ]
# -copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst> :
#   Identical to the -put command.

# -copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
#   Identical to the -get command.
user2314737
  • 27,088
  • 20
  • 102
  • 114
0

Both -put & -copyFromLocal commands work exactly the same. You cannot use -put command to copy files from one HDFS directory to another. Let's see this with an example: say your root has two directories, named 'test1' and 'test2'. If 'test1' contains a file 'customer.txt' and you try copying it to test2 directory

$ hadoop fs -put /test1/customer.txt /test2

It will result in 'no such file or directory' error since 'put' will look for the file in the local file system and not hdfs. They are both meant to copy files (or directories) from the local file system to HDFS, only.

  • Maybe if you specify the filesystem in the first argument, it wouldnt read the local? `hadoop fs -put hdfs:///test1/customer.txt hdfs:///test2`? – OneCricketeer Feb 21 '18 at 02:28