8

In my web application I render pages using PHP script, and then generate static HTML files from them. The static HTML are served to the users to speed up performance. The HTML files become stale eventually, and need to be deleted.

I am debating between two ways to write the eviction script.

The first is using a single find command, like

find /var/www/cache -type f -mmin +10 -exec rm \{} \;

The second form is by piping through xargs, something like

find /var/www/cache -type f -mmin +10 -print0 | xargs -0 rm

The first form invokes rm for each file it finds, while the second form just sends all the file names to a single rm (but the file list might be very long).

Which form would be faster?

In my case, the cache directory is shared between a few web servers, so this is all done over NFS, if that matters for this issue.

SamB
  • 9,039
  • 5
  • 49
  • 56
yhager
  • 1,632
  • 15
  • 16

4 Answers4

20

The xargs version is dramatically faster with a lot of files than the -exec version as you posted it, this is because rm is executed once for each file you want to remove, while xargs will lump as many files as possible together into a single rm command.

With tens or hundreds of thousands of files, it can be the difference between a minute or less versus the better part of an hour.

You can get the same behavior with -exec by finishing the command with a "+" instead of "\;". This option is only available in newer versions of find.

The following two are roughly equivalent:

find . -print0 | xargs -0 rm
find . -exec rm \{} +

Note that the xargs version will still run slightly faster (by a few percent) on a multi-processor system, because some of the work can be parallelized. This is particularly true if a lot of computation is involved.

tylerl
  • 30,197
  • 13
  • 80
  • 113
  • 1
    I found xargs a faster way to go. I through the first 250,000 files taking almost two hours. Then I stumbled on this SO and tried xargs. Completed the rest of the 750,000 in half an hour like a champ! – bbbco Feb 27 '14 at 19:01
  • `-exec ... +` is part of the POSIX standard for `find`; support for it should be fairly widespread. – chepner Apr 22 '16 at 17:29
6

I expect the xargs version to be slightly faster as you aren't spawning a process for each filename. But, I would be surprised if there was actually much difference in practice. If you're worried about the long list xargs sends to each invocation of rm, you can use -l with xargs to limit the number of tokens it will use. However, xargs knows the longest cmdline length and won't go beyond that.

kbyrd
  • 3,321
  • 27
  • 41
2

The find command has a -delete option builtin in, perhaps that could be useful as well? http://lists.freebsd.org/pipermail/freebsd-questions/2004-July/051768.html

natevw
  • 16,807
  • 8
  • 66
  • 90
  • Nice, thanks. I looked at the man page, and there is one caveat that should be understood before anyone wants to use -delete option with find. I can't paste it here, but be sure to read that man carefully. – yhager Mar 01 '11 at 05:02
1

Using xargs is faster as compared to exec with find.

I tried to count no of lines in files in node_module folder with js extension using xargs and exec. So the output below.

time find . -type f -name "*.js" -exec wc -l {} \;

real    0m0.296s
user    0m0.133s
sys     0m0.038s

time find . -type f -name "*.js" |xargs wc -l
real    0m0.019s
user    0m0.005s
sys     0m0.006s

xargs executes approx 15 times faster than exec.

abhishek
  • 11
  • 1