Get a listing, add up sizes, and sort it by owner (with Perl)
perl -wE'
chdir (shift // ".");
for (glob ".* *") {
next if not -f;
($owner_id, $size) = (stat)[4,7]
or do { warn "Trouble stat for: $_"; next };
$rept{$owner_id} += $size
}
say (getpwuid($_)//$_, " => $rept{$_} bytes") for sort keys %rept
'
I didn't get to benchmark it, and it'd be worth trying it out against an approach where the directory is iterated over, as opposed to glob
-ed (while I found glob
much faster in a related problem).
I expect good runtimes in comparison with ls
, which slows down dramatically as a file list in a single directory gets long. This is due to the system so Perl will be affected as well but as far as I recall it handles it far better. However, I've seen a dramatic slowdown only once entries get to half a million or so, not a few thousand, so I am not sure why it runs slow on your system.
If this need be recursive in directories it finds then use File::Find. For example
perl -MFile::Find -wE'
$dir = shift // ".";
find( sub {
return if not -f;
($owner_id, $size) = (stat)[4,7]
or do { warn "Trouble stat for: $_"; return };
$rept{$owner_id} += $size
}, $dir );
say (getpwuid($_)//$_, "$_ => $rept{$_} bytes") for keys %rept
'
This scans a directory with 2.4 Gb, of mostly small files over a hierarchy of subdirectories, in a little over 2 seconds. The du -sh
took around 5 seconds (the first time round).
It is reasonable to bring these two into one script
use warnings;
use strict;
use feature 'say';
use File::Find;
use Getopt::Long;
my %rept;
sub get_sizes {
return if not -f;
my ($owner_id, $size) = (stat)[4,7]
or do { warn "Trouble stat for: $_"; return };
$rept{$owner_id} += $size
}
my ($dir, $recurse) = ('.', '');
GetOptions('recursive|r!' => \$recurse, 'directory|d=s' => \$dir)
or die "Usage: $0 [--recursive] [--directory dirname]\n";
($recurse)
? find( { wanted => \&get_sizes }, $dir )
: find( { wanted => \&get_sizes,
preprocess => sub { return grep { -f } @_ } }, $dir );
say (getpwuid($_)//$_, " => $rept{$_} bytes") for keys %rept;
I find this to perform about the same as the one-dir-only code above, when run non-recursively (default as it stands).
Note that File::Find::Rule interface has many conveniences but is slower in some important use cases, what clearly matters here. (That analysis should be redone since it's a few years old.)