55

Is there any way we can overwrite existing files, while coping from HDFS using:

hadoop fs -copyToLocal <HDFS PATH> <local path>
Keshav Pradeep Ramanath
  • 1,623
  • 4
  • 24
  • 33
hjamali52
  • 1,135
  • 5
  • 12
  • 19
  • 5
    Unfortunately not, but you could easily knock together a little script surely to do this for you if it that much of an issue? Combining this with an existence check and `rm` should suffice. – Quetzalcoatl May 08 '13 at 10:20
  • 2
    You should move this into the answers section! – greedybuddha May 08 '13 at 17:42

8 Answers8

49
fs -copyFromLocal -f $LOCAL_MOUNT_SRC_PATH/yourfilename.txt your_hdfs_file-path

So -f option does the trick for you.

It also works for -copyToLocal as well.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Arijit Sen
  • 659
  • 5
  • 2
  • 1
    Oh yeah well did you know -f doesn't work for copyToLocal? – DPEZ Jul 03 '19 at 14:58
  • 6
    -f does not work with copyToLocal. Check documentation: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#copyToLocal – thegreatcoder Jan 24 '20 at 16:57
13

You can first delete, then write.

hadoop fs -rmr <path> removes everything under given path in hdfs including the path itself

rm -rf <path> removes in local file system.

Make sure that there is no other file in the directory.

smttsp
  • 4,011
  • 3
  • 33
  • 62
  • 3
    "hadoop fs -rmr " this command not only removes files under this path, but also removes this directory. If you just want to remove files not directory, you should use "hadoop fs -rm /*" – Haimei Jun 17 '14 at 19:36
  • Do you think it is not gonna give exception when he tries to run the same command after `hadoop fs -rm /*`? – smttsp Jun 18 '14 at 07:01
  • in fact, if you do that, it gives warn, but not exception. – Haimei Jun 18 '14 at 15:45
  • Try and tell it again – smttsp Jun 18 '14 at 19:57
  • 1
    Btw, rmr has been deprecated; use rm -r instead. – Paul Rigor Jan 31 '15 at 14:02
  • Apart that rmr has been deprecated in favour of rm -r, but I think it's not a good practice to delete all the content of a dir when, maybe, only one file will be overwritten. Better the answer from Arijit that seems to do the trick in the correct way. – rollsappletree Feb 03 '15 at 11:56
7

I used the command below and it helped:

hadoop fs -put -f <<local path>> <<hdfs>>

but from put docs:

Copy single src, or multiple srcs from local file system to the destination file system.

Nikita B
  • 3,303
  • 1
  • 23
  • 41
Sohan
  • 113
  • 1
  • 1
5

Force option is not there for either of the commands (get /copytolocal).

Below are three options:

  1. Remove the file on localmachine with rm command and use copyToLocal/get.

  2. Rename your local file to new name so that you can have the file with same name as on cluster. use mv command for that and use get/copyTolocal command.

  3. Rename the file there on the cluster itself and use copytolocal

    hadoop fs -mv [oldpath] [newpath]
    hadoop fs -copytolocal [newpath] .
    
Mohsenasm
  • 2,916
  • 1
  • 18
  • 22
Balaswamy Vaddeman
  • 8,360
  • 3
  • 30
  • 40
5

-f option did the trick

example:

bin>hdfs dfs -put -f D:\DEV\hadoopsampledata\mydata.json /input
Mohamed Ali JAMAOUI
  • 14,275
  • 14
  • 73
  • 117
tthreetorch
  • 426
  • 6
  • 9
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). – Neil Lunn Jun 23 '17 at 02:51
  • OP is for `copyToLocal`, not for local to HDFS. – ChikuMiku Apr 12 '18 at 05:01
2

You can try with distcp with -update . Main advantage is it will be update the target only when there is change in the file.

hadoop distcp -update file://source hdfs://namenode/target

hadoop distcp -update  file:///home/hduser/pigSample/labfiles/SampleData/books.csv  hdfs://10.184.37.158:9000/yesB
sterin jacob
  • 141
  • 1
  • 10
-1

You could try this :

bin/hadoop fs -rm /path_of_the_file | grep "0" | bin/hadoop fs -put ~/input_path /output_path
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
Tariq
  • 34,076
  • 8
  • 57
  • 79
-3

The -f work me.

hdfs dfs -copyFromLocal -f [LOCALFILEPATH] [HDFSFILEPAHT]

Robin Wang
  • 779
  • 1
  • 8
  • 16