Shell: list directories ordered by file count (including in subdirectories)

Question

I've nearly reached my limit for the permitted number of files in my Linux home directory, and I'm curious about where all the files are.

In any directory I can use for example find . -type f | wc -l to show a count of how many files are in that directory and in its subdirectories, but what I'd like is to be able to generate a complete list of all subdirectories (and sub-subdirectories etc) each with a count of all files contained in it and its subdirectories - if possible ranked by count, descending.

Eg if my file structure looks like this:

Home/
  file1.txt
  file2.txt
  Docs/
    file3.txt
    Notes/
      file4.txt
      file5.txt
    Queries/
      file6.txt
  Photos/
    file7.jpg

The output would be something like this:

7  Home
4  Home/Docs
2  Home/Docs/Notes
1  Home/Docs/Queries
1  Home/Photos

Any suggestions greatly appreciated. (Also a quick explanation of the answer, so I can learn from this!). Thanks.

What makes you think you're nearing the limit of files *per directory*? Any limit of files per directory that I've aware of doesn't need to calculate files-per-subdirectories, only files directly in this directory... May be you've meant "inodes per partition"? — GreyCat, Aug 02 '11 at 20:41
I think it's the total number of files I have on the system (that was the gist of the explanation given by the sysadmin...). What I want is to find if there's a big folder full of old cache files or logs or crash reports that I can delete. — Richard Inglis, Aug 02 '11 at 20:54
... as for instance 10000 files in a hidden folder named `.../.metadata/.plugins/org.eclipse.epp.usagedata.recording` - blimey! — Richard Inglis, Aug 02 '11 at 21:04
`du ~/* | sort -n` will give you a sorted list of directory sizes, which is likely to be useful also — evil otto, Aug 02 '11 at 21:34

score 44 · Answer 1 · answered Jan 13 '14 at 16:23

44

I use the following command

find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n

Which produces something like:

[root@ip-***-***-***-*** /]# find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
      1 .autofsck
      1 stat-nginx-access
      1 stat-nginx-error
      2 tmp
     14 boot
     88 bin
    163 sbin
    291 lib64
    597 etc
    841 opt
   1169 root
   2900 lib
   7634 home
  42479 usr
  80964 var

answered Jan 13 '14 at 16:23

ajtrichards

29,723
13
94
101

1

This seems to be the most efficient solution, as it does not fork a new process for every file to be counted, but rather processes a large stream of files with a single 'cut' command. – nlx-ck Mar 06 '16 at 13:19

sagi · Answer 2 · 2011-08-02T20:46:14.197

8

This should work:

find ~ -type d -exec sh -c "fc=\$(find '{}' -type f | wc -l); echo -e \"\$fc\t{}\"" \; | sort -nr

Explanation: In the command above will run "find ~ -type d" to find all the sub-directories the home-directory. For each of them, it runs a short shell script that finds the total number of files in that sub-directory (using the "find $dir -type f | wc -l" command that you already know), and will echo the number followed by the directory name. The sort command then runs to sort by the total number of files, in a descending order.

This is not the most efficient solution (you end up scanning the same directory many times), but I am not sure you can do much better with a one liner :-)

edited Aug 02 '11 at 20:46

answered Aug 02 '11 at 20:31

sagi

5,619
1
30
31

Thanks, that works, but it creates a very long list! Guess I should have asked if it's possible to list only the top 50 results... – Richard Inglis Aug 02 '11 at 20:37
1

Just add "| head -50" at the end. – sagi Aug 02 '11 at 20:41
Ooops, it worked when I tried it locally (on my laptop HD, as a test), but when I ssh into the server where I actually need it, I get an error: `Unmatched ".` – Richard Inglis Aug 02 '11 at 21:58
Maybe /bin/sh is not bash on your server? Try replacing 'sh' in the command above with 'bash'. – sagi Aug 02 '11 at 22:04
Most probably there are file or directory names that contain space characters. Try the `-print0' option for find. – hochl Aug 02 '11 at 22:54
1

@hochi: as in `find ~ -print0 -type d...` and `fc=\$(find '{}' -print0 -type f...`? – Richard Inglis Aug 02 '11 at 23:16

wjb · Answer 3 · 2013-08-20T08:36:14.897

4

simpler and more efficient:

find ~ -type f -exec dirname {} \; | sort | uniq -c | sort -nr

edited Aug 20 '13 at 08:36

answered Aug 20 '13 at 08:29

wjb

41
3

This does not include subdirectory counts. – monguin Sep 03 '13 at 22:40

score 4 · Accepted Answer · answered Aug 02 '11 at 21:15

4

countFiles () {
    # call the recursive function, throw away stdout and send stderr to stdout
    # then sort numerically
    countFiles_rec "$1" 2>&1 >/dev/null | sort -nr
}

countFiles_rec () {
    local -i nfiles 
    dir="$1"

    # count the number of files in this directory only
    nfiles=$(find "$dir" -mindepth 1 -maxdepth 1 -type f -print | wc -l)

    # loop over the subdirectories of this directory
    while IFS= read -r subdir; do

        # invoke the recursive function for each one 
        # save the output in the positional parameters
        set -- $(countFiles_rec "$subdir")

        # accumulate the number of files found under the subdirectory
        (( nfiles += $1 ))

    done < <(find "$dir" -mindepth 1 -maxdepth 1 -type d -print)

    # print the number of files here, to both stdout and stderr
    printf "%d %s\n" $nfiles "$dir" | tee /dev/stderr
}


countFiles Home

produces

7 Home
4 Home/Docs
2 Home/Docs/Notes
1 Home/Photos
1 Home/Docs/Queries

answered Aug 02 '11 at 21:15

glenn jackman

238,783
38
220
352

Thanks glenn - sorry to be dense, but to use this do I need to put the function definitions in a file somewhere? – Richard Inglis Aug 02 '11 at 22:00
Yep. If you're writing a script, just add them to that file. – glenn jackman Aug 03 '11 at 01:00
You don't even need to put it in a file, you can paste the function straight into your current prompt and bash will define the function for you. – dalore Jun 10 '14 at 10:24
1

Note, that this will work only for **bash**, not shell per se (sh reports syntax error for `done < <(find "$dir" -mindepth 1 -maxdepth 1 -type d -print)'`) – om-nom-nom Sep 02 '14 at 11:08

loupiote · Answer 5 · 2015-06-15T02:51:06.177

find . -type d -exec sh -c '(echo -n "{} "; ls {} | wc -l)' \; | sort -n -k 2

This is pretty efficient.

It will display the counts in ascending order (i.e. largest at the end). To get it is descending order, add the "-r" option to "sort".

If you run this command in the "/" directory, it will scan the entire filesystem and tell you what are the directories that contain the most files and sub-directories. It's a good way to see where all your inodes are being used.

Note: this will not work for directories that contain spaces, but you could modify it to work in that case, if it's a problem for you.

score 1 · Answer 6 · answered Jun 06 '17 at 07:08

See following example: sort by column 2 reversely. Use sort -k 2 -r. -k 2 means sort with column 2 (space separated), -r means reverse.

# ls -lF /mnt/sda1/var/lib/docker/165536.165536/aufs/mnt/ | sort -k 2 -r
total 972
drwxr-xr-x   65 165536   165536        4096 Jun  5 12:23 ad45ea3c6a03aa958adaa4d5ad6fc25d31778961266972a69291d3664e3f4d37/
drwxr-xr-x   19 165536   165536        4096 Jun  6 06:46 7fa7f957669da82a8750e432f034be6f0a9a7f5afc0a242bb00eb8024f77d683/
drwxr-xr-x    2 165536   165536        4096 May  8 02:20 49e067ffea226cfebc8b95410e90c4bad6a0e9bc711562dd5f98b7d755fe6efb/
drwxr-xr-x    2 165536   165536        4096 May  8 01:19 45ec026dd49c188c68b55dcf98fda27d1f9dd32f825035d94849b91c433b6dd3/
drwxr-xr-x    2 165536   165536        4096 Mar 13 06:08 0d6e95d4605ab34d1454de99e38af59a267960999f408f720d0299ef8d90046e/
drwxr-xr-x    2 165536   165536        4096 Mar 13 02:25 e9b252980cd573c78065e8bfe1d22f01b7ba761cc63d3dbad284f5d31379865a/
drwxr-xr-x    2 165536   165536        4096 Mar 13 02:24 f4aa333b9c208b18faf00b00da150b242a7a601693197c1f1ca78b9ab2403409/
drwxr-xr-x    2 165536   165536        4096 Mar 13 02:24 3946669d530695da2837b2b5ed43afa11addc25232b29cc085a19c769425b36b/
drwxr-xr-x    2 165536   165536        4096 Mar 11 11:11 44293f77f63806a58d9b97c3c9f7f1397b6f0935e236250e24c9af4a73b3e35b/

Henrik Hedemann · Answer 7 · 2015-06-27T09:06:35.607

-1

If however you are fine with the non cumulative solution by using dirname (see answer of wjb) then by far more efficient is:

find ~ -type f -print0 | xargs -0 dirname | sort | uniq -c | sort -n

Note that this does not display empty dirs. For that you may do find ~ -type d -empty if your version of find supports it.

edited Jun 27 '15 at 09:06

answered Jun 27 '15 at 08:12

Henrik Hedemann

11
3

Hmm, since dirname is there, path is missing. My guess is either xargs -0 does not work on your system, or you have files called " " or so, i.e the file name is just composed of whitespace. The latter is possible, but weird: – Henrik Hedemann Nov 28 '15 at 21:22
actually i've just tested with white space files / dirs ( touch " " and mkdir " " to create such weird stuff) and the command still works. However, the find -print0 | xargs -0 hasn't always been around cf : https://www.gnu.org/software/findutils/manual/html_mono/find.html . – Henrik Hedemann Nov 28 '15 at 21:39

Shell: list directories ordered by file count (including in subdirectories)

7 Answers7