2

How do I recursively count files in a list of Linux directories?

Example:

/dog/
  /a.txt
  /b.txt
  /c.ipynb

/cat/
  /d.txt
  /e.pdf
  /f.png
  /g.txt

/owl/
  /h.txt

I want following output:

5 .txt
1 .pynb
1 .pdf
1 .png

I tried the following, with no luck.

find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
Cyrus
  • 84,225
  • 14
  • 89
  • 153
ThinkGeek
  • 4,749
  • 13
  • 44
  • 91
  • What does not work? Are you missing the dot in front of the file extensions or do you want to have the output sorted numerically? – Cyrus Dec 20 '20 at 14:36

3 Answers3

1

This find + gawk may work for you:

find . -type f -print0 |
awk -v RS='\0' -F/ '{sub(/^.*\./, ".", $NF); ++freq[$NF]} END {for (i in freq) print freq[i], i}'

It is safe to use -print0 in find to handle files with whitespace and other special glob characters. Likewise we use -v RS='\0' in awk to ensure NUL byte is record seperator.

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Use Perl one-liners to make the output in the format you need, like so:

find . -type f | perl -pe 's{.*[.]}{.}' | sort | uniq -c | perl -lane 'print join "\t", @F;' | sort -nr

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
1

Assume you have a known a directory path with the following subdirectories foo, bar, baz, qux, quux, gorge and we want to count the file types based on extension, but only for the subdirectories, foo, baz and qux

The best is to just do

$ find /path/{foo,baz,qux} -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c

The exec part just uses a simple sh variable substitution to print the extension.

kvantour
  • 25,269
  • 4
  • 47
  • 72