36

I would like to know is there any command/expression to get only the file name in hadoop. I need to fetch only the name of file, when I do hadoop fs -ls it prints the whole path.

I tried below but just wondering if some better way to do it.

hadoop fs -ls <HDFS_DIR>|cut -d ' ' -f17 
Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53
Navneet Kumar
  • 3,732
  • 2
  • 18
  • 25

7 Answers7

47

The following command will return filenames only:

hdfs dfs -stat "%n" my/path/*

:added at Feb 04 '21

Actually last few years I use

hdfs dfs -ls -d my/path/* | awk '{print $8}'

and

hdfs dfs -ls my/path | grep -e "^-" | awk '{print $8}'

MichealKum
  • 490
  • 4
  • 7
  • 1
    `hadoop fs` is deprecated, use `hdfs dfs` instead – jirislav Aug 31 '17 at 15:30
  • 1
    Only returns the filename (and * doesn't work it seem to work). – samthebest Oct 09 '17 at 16:00
  • 1
    great answer, I am not sure why awk and sed trickery would be needed with this being available. – anirudh.vyas Dec 13 '17 at 20:20
  • it is working only as if I am running a single command. if I am running it in for loop it is not giving the expected result. it is separating file name on basis of spaces. –  Aug 13 '18 at 10:15
  • @jirislav `hadoop dfs` is deprecated, not the `fs` one. The latter is perfectly fine. As to the answer, be cautious as it returns **basenames**, not the full filenames – DimG Jul 30 '19 at 19:29
  • Any reason why we need the quotes round the %n? This seems to work just fine: `hdfs dfs -stat %n my/path/*` – user2739472 Feb 03 '21 at 08:38
  • Actually last few years I use `hdfs dfs -ls -d my/path/* | awk '{print $8}'` or `hdfs dfs -ls my/path | grep -e "^-" | awk '{print $8}'` – MichealKum Feb 04 '21 at 10:46
42

It seems hadoop ls does not support any options to output just the filenames, or even just the last column.

If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8

This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:

hadoop fs -ls | sed '1d;s/  */ /g' | cut -d\  -f8 | xargs -n 1 basename

I also filtered out the first line that says Found ?x items

Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:

hadoop fs -ls /tmp | sed 1d | perl -wlne'print +(split " ",$_,8)[7]'

Jakub Kotowski
  • 7,411
  • 29
  • 38
  • 1
    Thanks a ton !! It worked !! My need was the full path so.. Thanks @rojomoke also for answering – Navneet Kumar Feb 06 '14 at 05:58
  • please use this hdfs dfs -ls /data/*.txt* | rev | cut -d\ -f1 | rev, above code doesn't support all the file names .... – sri hari kali charan Tummala Nov 03 '15 at 15:56
  • Also note that, The first line `Found ?X items` is not shown when listing file with glob pattern e.g. `/path/to/*.log` . A better, more precised way of replacing: `hdfs dfs -ls /path/*.log | sed 's/ */ /g;/Found [0-9]* items/d' | cut -d\ -f8` – ttimasdf Nov 27 '20 at 09:56
28

I hope this helps someone - with version 2.8.x+ (available in 3 as well) -

hadoop fs -ls  -C  /paths/
anirudh.vyas
  • 552
  • 5
  • 11
  • Perfetto, exactly what I needed, Thanks ! – tricky Apr 25 '18 at 13:40
  • but if the directory is having multiple files then this command is giving all the file path as a single string. please suggest how to get in new line all the file path –  Aug 13 '18 at 10:18
2

One more solution I use often. There are few related things:

  • list files and dirs only without Found x items with

hdfs dfs -ls -d mypath/*

  • keep full path only with

hdfs dfs -ls -d mypath/* | awk '{print $8}'

  • only file names

hdfs dfs -ls -d mypath/* | awk '{print $8}'| while read fn; do basename $fn; done

  • and in additional use path templates if necessary:

hdfs dfs -ls -d {my,his}path/*.{txt,doc}

MichealKum
  • 490
  • 4
  • 7
1
 hadoop fs -ls  -C  /path/* | xargs -n 1 basename
loneStar
  • 3,780
  • 23
  • 40
0

Use the basename command, which strips any prefix ending in '/' from the string.

basename $(hadoop fs -ls)
rojomoke
  • 3,765
  • 2
  • 21
  • 30
0

The Below Command return only the File names in the Directory. Awk Splits the list by '/' and prints last field which would be the File name.

hdfs dfs -ls /<folder> | awk -F'/' '{print $NF}'

Vinod ram
  • 95
  • 8