4

Is there a reason why

find . -mindepth 1 -maxdepth 1 | wc -l

is suggested against

ls -1 | wc -l

(or vice-versa ?)

to count the total number of files/directories inside a folder

Notes:

  1. This question is more concerned with just counting stuff.
  2. There are no files with leading .
  3. There may be non-standard files with say a \n in it.
sjsam
  • 21,411
  • 5
  • 55
  • 102
  • Both will fail against filenames with newlines..as far as counting is concerned i don't see any difference, both will fail..Where did you find the recommendation? – heemayl May 10 '16 at 03:34
  • Do you need to account for non-standard file-names, including ones with `\n` (or `\r`) and other Control-Chars etc embedded? In that case search for `find . -print0` solutions, experiment and then post a new Q. If you don't need a completely bullet-proof solution, either version about seems usable. Good luck. – shellter May 10 '16 at 03:34
  • Do you need to count `.exrc` sort of files (leading `.` char)? Then you need to add that to the cmd. Good luck. – shellter May 10 '16 at 03:35
  • @shellter. I am not worried about leading `.` but there may be non-standard files – sjsam May 10 '16 at 03:41

3 Answers3

4

The first command...

find . -mindepth 1 -maxdepth 1 | wc -l

...will list files and directories that start with ., while your ls command will not. The equivalent ls command would be:

ls -A | wc -l

Both will give you the same answers. As folks pointed out in the comments, both of these will give you wrong answers if there are entries that contained embedded newlines, because the above commands are simply counting the number of lines of output.

Here's one way to count the number of files that is independent of filename quirks:

find . -mindepth 1 -maxdepth 1 -print0 | xargs -0i echo | wc -l

This passes the filenames to xargs with a NUL terminator, rather than relying on newlines, and then xargs simply prints a blank line for each file, and we count the number of lines of output from xargs.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • 1
    Or see the second answer to [this question](http://stackoverflow.com/questions/11307257/is-there-a-bash-command-which-counts-files). – larsks May 10 '16 at 03:47
  • As I mentioned in the question there are no files with leading `.` but there are non-standard files and thanks for the solution. – sjsam May 10 '16 at 03:56
  • did you mean `ls -a | wc -l` ? `-A` option says `do not list implied . and ..` – Sundeep May 10 '16 at 04:24
2

The reason find(1) is preferred to ls(1) is that

  • ls defaults to sorting the list of files
  • find has no sorting capability

Sorting can be extremely memory consuming for large data sets. So even though you can use ls -f or ls -U to disable sorting, I find that using find is safer because I know that the directory listing won't be sorted, no matter what options are passed to it.

In any case, telling the command to print less about each file can help in performance and correctness. Performance because the command can avoid the stat(2) call and correctness because if you e.g. only print the inode, you'll be certain that the name of the file won't affect the output (e.g. line breaks, carriage returns or other odd characters.)

mogsie
  • 4,021
  • 26
  • 26
2

May I add some more?

Reasons to use find instead of ls

As stated by mogsie the main reason is about performance:

  • ls has no options but sorting (default by name), so it must wait for the whole list to be returned by the OS, and then sort it, before printing it in the standard output
  • find, on the other hand, has no sorting capabilities, so nodes are evaluated directly when the OS returns the buffer of nodes, potentially before getting the whole list, and does not need to sort them.

Effective solution

disclosure: I used this solution in production to count entries of a directory with about 300k items

find . -mindepth 1 -maxdepth 1 -printf '.' | wc -m 

Basically this prints a dot in the standard output for every fs-entry, then counts the printed characters.

The big advantage on file names is easy to imagine: they are never used; and the other advantage on performance is that no attribute is required to count the files (as you woulod expect from a function that count files in a directory), unless you specify some filter.

If you want to make it start counting and then eventually get back and see how many items have been found, you can also redirect the standard output to a file (eventually in a tmpfs, so you never have to write on disk), then detach the shell and eventually get back and count the characters in the file:

nohup find . -mindepth 1 -maxdepth 1 -printf '.' > /tmp/count.txt &

Then simply counting the dots in the file will give you the current count

wc -m /tmp/count.txt

... and if you are eager to get the current counter's updates

watch wc -m /tmp/count.txt
  • Thanks for your time.. But I am wondering if `-printf '.' is the right idea. I/O operations are generally slow. – sjsam Jul 21 '23 at 22:02
  • Note that the `ls` approach prints much more to stdout. Furthermore, you are reading a potentially big folder, which *is* an expensive IO operation. Probably the most expensive in the sequence. With `printf '.'` you print just one char per file to the stdout. If you need more performance you have to call the operating system directly from C (for example with [readdir](https://man7.org/linux/man-pages/man3/readdir.3.html) or [getdents](https://man7.org/linux/man-pages/man2/getdents.2.html) if you want direct access to deal with syscall by yourself), accumulate into a counter and then print it. – Marco Carlo Moriggi Jul 26 '23 at 12:21