Removing millions of files - oneliner

Question

I want to remove millions of files in a directory and pages mentioned that the following Perl code is the fastest:

perl -e 'chdir "BADnew" or die; opendir D, "."; while ($n = readdir D) { unlink $n }`

However, is it also possible to do this on only files containing the word 'sorted'? Does anyone know how to rewrite this?

What's the fascination with one-liners? You can convert almost any script to one line by removing newlines, but it is more important that code is readable than that it is on one line. — Jonathan Leffler, Mar 29 '15 at 21:02
You need to replace the `unlink $n` with a call to a function that opens the file, looks for the string you're after (`sorted`), closes the file, and then calls `unlink` if the word was found. Not very hard; have at it! — Jonathan Leffler, Mar 29 '15 at 21:03
@PaulRoub: two reasons: there are millions of files, so the argument list will be too long, and because the files contain `sorted`, not the file names. — Jonathan Leffler, Mar 29 '15 at 21:04
Do you mean files whose contents contain the word *sorted* or those whose names contain the word *sorted*? — Mark Setchell, Mar 29 '15 at 21:16

score 1 · Answer 1 · answered Mar 29 '15 at 21:12

1

It can be done using find and grep combination:

find BADnew -type f -exec grep -q sorted {} \; -exec rm {} \;

Second -exec command will be executed only if return code for first one is zero.

You can do dry run:

find BADnew -type f -exec grep -q sorted {} \; -exec echo {} \;

answered Mar 29 '15 at 21:12

Alex

29
4

The disadvantage with this is that it will have to create millions of processes each of which executes a single `rm` of a single file, which will make it very slow. – Mark Setchell Mar 29 '15 at 22:43
Yes, it's true – but it heavily depends on number of files which contain keyword – new process for `rm` will be created on match. And checking each file content will kill I/O anyway. – Alex Mar 29 '15 at 23:17
c=0;forks_num=10;sleep_secs=3 find . -type f -exec grep -q link {} \; -exec echo {} \; | { while read -r file ; do c=$((c+1)) ; test $c -eq "$forks_num" && sleep "$sleep_secs" && export c=0 ; ( echo "match: $file" )& done } – Yordan Georgiev Apr 12 '15 at 07:15
Did you read op's question? Did you try running this on a folder containing **millions** of files? It does not work. `find` builds a list in memory. Running this will destroy your system by consuming all memory, triggering disk swapping. – Cerin Oct 10 '19 at 12:17

score 1 · Answer 2 · answered Mar 29 '15 at 22:33

1

the core module File::Find will recursively traverse all the subdirectories and perform a subroutine on all files found

perl -MFile::Find -e 'find( sub { open $f,"<",$_; unlink if grep /sorted/, <$f> }, "BADnew")'

answered Mar 29 '15 at 22:33

beasy

1,227
8
16

score 0 · Answer 3 · answered Mar 29 '15 at 22:39

Try:

find /where -type f -name \* -print0 | xargs -0 grep -lZ sorted | xargs -0 echo rm
#can search for specific ^^^ names                       ^^^^^^            ^^^^
#                                   what should contain the file            |
#                              remove the echo if satisfied with the result +

The above:

the find searches for files with a specified name (* - any)
the xargs ... grep list files what are contains the string
the xargs rm - removes the files
don't dies on "arg count too long"
the files could have whitespaces in their names
needs grep what knows the -Z

Also a variant:

find /where -type f -name \* -print0 | xargs -0 grep -lZ sorted | perl -0 -nle unlink

Borodin · Answer 4 · 2015-03-30T13:20:16.513

You haven't made it clear, despite specific questions, whether you require the file name or the file contents to contain sorted. Here are both solutions

First, chdir to the directory you're interested in. If you really need a one-liner for whatever reason then it is pointless to put the chdir inside the program.

cd BADnew

Then you can either unlink all nodes that are files and whose name contains sorted

perl -e'opendir $dh, "."; while(readdir $dh){-f and /sorted/ and unlink}'

or you can open each file and read it to see if its contents contain sorted. I hope it's clear that this method will be far slower, not least because you have to read the entire file to establish a negative. Note that this solution relies on the

perl -e'opendir $dh, "."; while(readdir $dh){-f or next; @ARGV=$f=$_; /sorted/ and unlink($f),last while <>}'

Removing millions of files - oneliner

4 Answers4