Is `ls -f | grep -c .` the fastest way to count files in directory, when using POSIX / Unix system (Big Data)?

Question

I used to do ls path-to-whatever| wc -l, until I discovered, that it actually consumes huge amount of memory. Then I moved to find path-to-whatever -name "*" | wc -l, which seems to consume much graceful amount of memory, regardless how many files you have.

Then I learned that ls is mostly slow and less memory efficient due to sorting the results. By using ls -f | grep -c ., one will get very fast results; the only problem is filenames which might have "line breaks" in them. However, that is a very minor problem for most use cases.

Is this the fastest way to count files?

EDIT / Possible Answer: It seems that when it comes to Big Data, some versions of ls, find etc. have been reported to hang with >8 million files (need to be confirmed though). In order to succeed with very large file counts (my guess is > 2.2 billion), one should use getdents64 system call instead of getdents, which can be done with most programming languages, that support POSIX standards. Some filesystems might offer faster non-POSIX methods for counting files.

Count the number of files in a directory, non-recursive? Approximately, how many files are we talking about? — James Brown, May 20 '17 at 10:52
http://stackoverflow.com/questions/3702104/find-the-number-of-files-in-a-directory — James Brown, May 20 '17 at 11:13
Possible duplicate of [Find the number of files in a directory](http://stackoverflow.com/questions/3702104/find-the-number-of-files-in-a-directory) — shellter, May 20 '17 at 12:40
@James Brown: We are talking about millions or billions of files; actually more interested about the theoretical computational complexity of different alternatives. I am also interested of filesystem / OS level answers, not just "getting the thing done" ls -U1 seems to be a very good alternative. However, it is not POSIX standard; I am satisfied with Unix solutions though. — Ahti Ahde, May 21 '17 at 18:51
Edited the question as I already found a satisfactory answer; however, I am still interested to the theoretical side of the issue. — Ahti Ahde, May 21 '17 at 19:12
This answer sort of addresses the "large directory" issue. . Directories with (over) thousands of entries will become slow under any BSD ffs-type system. — Erik Bennett, May 23 '17 at 19:35
Perhaps not, the `grep -c .` got along from some other answer, where I guess it was used to get rid of some ls header lines. — Ahti Ahde, May 29 '17 at 03:43
I believe that the `grep -c .` answers context were with `find` which gave leading `./` on file names, which were wanted to count as answers. However, it seems that grep is not that significantly slower either. — Ahti Ahde, May 29 '17 at 03:54

James Brown · Accepted Answer · 2017-05-29T14:45:28.937

One way would be to use readdir and count the entries (in one directory). Below I'm counting regular file and using d_type==DT_REG which is available for limited OSs and FSs (man readdir and see NOTES) but you could just comment out that line and count all the dir entries:

#include <stdio.h>
#include <dirent.h>

int main (int argc, char *argv[]) {

  struct dirent *entry;
  DIR *dirp;

  long long c;                            // 64 bit

  if(argc<=1)                             // require dir
    return 1;

  dirp = opendir (argv[1]);

  if (dirp == NULL) {                     // dir not found
    return 2;
  }

  while ((entry = readdir(dirp)) != NULL) {
    if(entry->d_type==DT_REG)
      c++;
      // printf ("%s\n", entry->d_name);  // for outputing filenames
  }
  printf ("%lli\n", c);

  closedir (dirp);
  return 0;
}

Complie and run:

$ gcc code.c
$ ./a.out ~
254

(I need to clean my home dir :)

Edit:

I touched a 1000000 files into a dir and run a quick comparison (best user+sys of 5 presented):

$ time ls -f | grep -c .
1000005

real    0m1.771s
user    0m0.656s
sys     0m1.244s

$ time ls -f | wc -l
1000005

real    0m1.733s
user    0m0.520s
sys     0m1.248s

$ time ../a.out  .
1000003

real    0m0.474s
user    0m0.048s
sys     0m0.424s

Edit 2:

As requested in comments:

$ time ./a.out testdir | wc -l
1000004

real    0m0.567s
user    0m0.124s
sys     0m0.468s

Many thanks for the great answer! Could you also run a test, which does the same as ls -f (enable text output and `time ../a.out | wc -l`) so that the two methods would be functionally identical? — Ahti Ahde, May 29 '17 at 03:54

Is `ls -f | grep -c .` the fastest way to count files in directory, when using POSIX / Unix system (Big Data)?

1 Answers1

Linked