20

I've nearly reached my limit for the permitted number of files in my Linux home directory, and I'm curious about where all the files are.

In any directory I can use for example find . -type f | wc -l to show a count of how many files are in that directory and in its subdirectories, but what I'd like is to be able to generate a complete list of all subdirectories (and sub-subdirectories etc) each with a count of all files contained in it and its subdirectories - if possible ranked by count, descending.

Eg if my file structure looks like this:

Home/
  file1.txt
  file2.txt
  Docs/
    file3.txt
    Notes/
      file4.txt
      file5.txt
    Queries/
      file6.txt
  Photos/
    file7.jpg

The output would be something like this:

7  Home
4  Home/Docs
2  Home/Docs/Notes
1  Home/Docs/Queries
1  Home/Photos

Any suggestions greatly appreciated. (Also a quick explanation of the answer, so I can learn from this!). Thanks.

Richard Inglis
  • 5,888
  • 2
  • 33
  • 37
  • What makes you think you're nearing the limit of files *per directory*? Any limit of files per directory that I've aware of doesn't need to calculate files-per-subdirectories, only files directly in this directory... May be you've meant "inodes per partition"? – GreyCat Aug 02 '11 at 20:41
  • I think it's the total number of files I have on the system (that was the gist of the explanation given by the sysadmin...). What I want is to find if there's a big folder full of old cache files or logs or crash reports that I can delete. – Richard Inglis Aug 02 '11 at 20:54
  • ... as for instance 10000 files in a hidden folder named `.../.metadata/.plugins/org.eclipse.epp.usagedata.recording` - blimey! – Richard Inglis Aug 02 '11 at 21:04
  • `du ~/* | sort -n` will give you a sorted list of directory sizes, which is likely to be useful also – evil otto Aug 02 '11 at 21:34

7 Answers7

44

I use the following command

find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n

Which produces something like:

[root@ip-***-***-***-*** /]# find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
      1 .autofsck
      1 stat-nginx-access
      1 stat-nginx-error
      2 tmp
     14 boot
     88 bin
    163 sbin
    291 lib64
    597 etc
    841 opt
   1169 root
   2900 lib
   7634 home
  42479 usr
  80964 var
ajtrichards
  • 29,723
  • 13
  • 94
  • 101
  • 1
    This seems to be the most efficient solution, as it does not fork a new process for every file to be counted, but rather processes a large stream of files with a single 'cut' command. – nlx-ck Mar 06 '16 at 13:19
8

This should work:

find ~ -type d -exec sh -c "fc=\$(find '{}' -type f | wc -l); echo -e \"\$fc\t{}\"" \; | sort -nr

Explanation: In the command above will run "find ~ -type d" to find all the sub-directories the home-directory. For each of them, it runs a short shell script that finds the total number of files in that sub-directory (using the "find $dir -type f | wc -l" command that you already know), and will echo the number followed by the directory name. The sort command then runs to sort by the total number of files, in a descending order.

This is not the most efficient solution (you end up scanning the same directory many times), but I am not sure you can do much better with a one liner :-)

sagi
  • 5,619
  • 1
  • 30
  • 31
4

simpler and more efficient:

find ~ -type f -exec dirname {} \; | sort | uniq -c | sort -nr
wjb
  • 41
  • 3
4
countFiles () {
    # call the recursive function, throw away stdout and send stderr to stdout
    # then sort numerically
    countFiles_rec "$1" 2>&1 >/dev/null | sort -nr
}

countFiles_rec () {
    local -i nfiles 
    dir="$1"

    # count the number of files in this directory only
    nfiles=$(find "$dir" -mindepth 1 -maxdepth 1 -type f -print | wc -l)

    # loop over the subdirectories of this directory
    while IFS= read -r subdir; do

        # invoke the recursive function for each one 
        # save the output in the positional parameters
        set -- $(countFiles_rec "$subdir")

        # accumulate the number of files found under the subdirectory
        (( nfiles += $1 ))

    done < <(find "$dir" -mindepth 1 -maxdepth 1 -type d -print)

    # print the number of files here, to both stdout and stderr
    printf "%d %s\n" $nfiles "$dir" | tee /dev/stderr
}


countFiles Home

produces

7 Home
4 Home/Docs
2 Home/Docs/Notes
1 Home/Photos
1 Home/Docs/Queries
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Thanks glenn - sorry to be dense, but to use this do I need to put the function definitions in a file somewhere? – Richard Inglis Aug 02 '11 at 22:00
  • Yep. If you're writing a script, just add them to that file. – glenn jackman Aug 03 '11 at 01:00
  • You don't even need to put it in a file, you can paste the function straight into your current prompt and bash will define the function for you. – dalore Jun 10 '14 at 10:24
  • 1
    Note, that this will work only for **bash**, not shell per se (sh reports syntax error for `done < <(find "$dir" -mindepth 1 -maxdepth 1 -type d -print)'`) – om-nom-nom Sep 02 '14 at 11:08
2
find . -type d -exec sh -c '(echo -n "{} "; ls {} | wc -l)' \; | sort -n -k 2

This is pretty efficient.

It will display the counts in ascending order (i.e. largest at the end). To get it is descending order, add the "-r" option to "sort".

If you run this command in the "/" directory, it will scan the entire filesystem and tell you what are the directories that contain the most files and sub-directories. It's a good way to see where all your inodes are being used.

Note: this will not work for directories that contain spaces, but you could modify it to work in that case, if it's a problem for you.

loupiote
  • 51
  • 2
1

See following example: sort by column 2 reversely. Use sort -k 2 -r. -k 2 means sort with column 2 (space separated), -r means reverse.

# ls -lF /mnt/sda1/var/lib/docker/165536.165536/aufs/mnt/ | sort -k 2 -r
total 972
drwxr-xr-x   65 165536   165536        4096 Jun  5 12:23 ad45ea3c6a03aa958adaa4d5ad6fc25d31778961266972a69291d3664e3f4d37/
drwxr-xr-x   19 165536   165536        4096 Jun  6 06:46 7fa7f957669da82a8750e432f034be6f0a9a7f5afc0a242bb00eb8024f77d683/
drwxr-xr-x    2 165536   165536        4096 May  8 02:20 49e067ffea226cfebc8b95410e90c4bad6a0e9bc711562dd5f98b7d755fe6efb/
drwxr-xr-x    2 165536   165536        4096 May  8 01:19 45ec026dd49c188c68b55dcf98fda27d1f9dd32f825035d94849b91c433b6dd3/
drwxr-xr-x    2 165536   165536        4096 Mar 13 06:08 0d6e95d4605ab34d1454de99e38af59a267960999f408f720d0299ef8d90046e/
drwxr-xr-x    2 165536   165536        4096 Mar 13 02:25 e9b252980cd573c78065e8bfe1d22f01b7ba761cc63d3dbad284f5d31379865a/
drwxr-xr-x    2 165536   165536        4096 Mar 13 02:24 f4aa333b9c208b18faf00b00da150b242a7a601693197c1f1ca78b9ab2403409/
drwxr-xr-x    2 165536   165536        4096 Mar 13 02:24 3946669d530695da2837b2b5ed43afa11addc25232b29cc085a19c769425b36b/
drwxr-xr-x    2 165536   165536        4096 Mar 11 11:11 44293f77f63806a58d9b97c3c9f7f1397b6f0935e236250e24c9af4a73b3e35b/
osexp2000
  • 2,910
  • 30
  • 29
-1

If however you are fine with the non cumulative solution by using dirname (see answer of wjb) then by far more efficient is:

find ~ -type f -print0 | xargs -0 dirname | sort | uniq -c | sort -n

Note that this does not display empty dirs. For that you may do find ~ -type d -empty if your version of find supports it.

  • Hmm, since dirname is there, path is missing. My guess is either xargs -0 does not work on your system, or you have files called " " or so, i.e the file name is just composed of whitespace. The latter is possible, but weird: – Henrik Hedemann Nov 28 '15 at 21:22
  • actually i've just tested with white space files / dirs ( touch " " and mkdir " " to create such weird stuff) and the command still works. However, the find -print0 | xargs -0 hasn't always been around cf : https://www.gnu.org/software/findutils/manual/html_mono/find.html . – Henrik Hedemann Nov 28 '15 at 21:39