9

In a solaris system that processes large numbers of files and stores their information in a database (yes i know that using the database is the quickest way to get information about the number of files we have). I need fast way to monitor the files as they progress through the system on their way to being stored in a database.

Currently I use a perl script that reads in the directory to an array and then grabs the size of the array and sends it to a monitoring script. Unfortunately as our system grows this monitor is getting more and more slow.

I am looking for a method that will operate much more quickly instead of pausing and updating every 15-20 seconds after performing the count operation on all the directories involved.

I am relatively certain that my bottleneck is the read directory into array operation.

I don't need any information about the files, I don't need sizes or file names, just the number of files in the directory.

In my code I do not count hidden files or the text files I use to hold configuration information. It would be great if this functionality was preserved but is certainly not mandatory.

I have found some references to counting inodes with C code or something along those lines but I am not very experienced in that area.

I would like to make this monitor as real-time as possible.

The perl code I use looks like this:

opendir (DIR, $currentDir) or die "Cannot open directory: $!";
@files = grep ! m/^\./ && ! /config_file/, readdir DIR; # skip hidden files and config files
closedir(DIR);
$count = @files;
Andrew
  • 397
  • 1
  • 3
  • 13
  • use perl threads or forks to minimize waiting time – mpapec Jul 18 '13 at 20:08
  • @mpapec I like the idea, in the best case scenario this would still make my waiting time close to the amount of time that it would be if i executed this code on only my longest directory? If so, that isn't a bad idea but unfortunately the majority of my directories have less than 50 files while one or two have thousands. I would love to find a way to get away from reading in every file in the directory entirely. – Andrew Jul 18 '13 at 20:13

2 Answers2

10

What you do right now reads the whole directory (more or less) into memory only to discard that content for its count. Avoid that by streaming the directory instead:

my $count;
opendir(my $dh, $curDir) or die "opendir($curdir): $!";
while (my $de = readdir($dh)) {
  next if $de =~ /^\./ or $de =~ /config_file/;
  $count++;
}
closedir($dh);

Importantly, don't use glob() in any of its forms. glob() will expensively stat() every entry, which is not overhead you want.

Now, you might have much more sophisticated and lighter weight ways of doing this depending on OS capabilities or filesystem capabilities (Linux, by way of comparison, offers inotify), but streaming the dir as above is about as good as you'll portably get.

Community
  • 1
  • 1
pilcrow
  • 56,591
  • 13
  • 94
  • 135
  • 1
    I like it. I think you are right. I want to stay away from file system specific solutions for now. I may go down that path in the future but I want to keep this thing as portable as possible. Thanks! This sped things up a bit. It isn't a LOT faster but it does help. – Andrew Jul 18 '13 at 20:46
-1

Keep it short.

@files = readdir(DIR) - 2;

The -2 is because readdir counts "." and ".." as directory entries.

print @files . " files found\n";
exit;

1 files found

Azrael
  • 1,094
  • 8
  • 19
Andrew
  • 1
  • Your assumption that there are exactly 2 dotfiles in any directory is an extremely unsafe and very often incorrect assumption. furthermore the question asked to allow the exclusion of specific configuration files. lastly this suggestion is no faster than the previous as the bottleneck is the read operation which exists in the solution above. – Andrew Oct 27 '14 at 18:13
  • -1 A bit problematic. `readdir` in a scalar context returns the next directory entry, rather than the number of (remaining?) entries. You're then subtracting two from this filename, which is probably converted to numeric value zero. You're then assigning the scalar -2 to a list. And as @Andrew noted, the OP needs to exclude all dotfiles and other specific patterns, anyway. – pilcrow Mar 16 '16 at 10:15