Delete multiple files based on partial names list

Question

I have a folder one my Red Hat server with approx. 500k files from various extensions. The name convention for those files is based on a number, for example:

a123456.csv
z123456.jpg
123456.exe
a234.jpg
234.exe

I designed a query which produce a list of all the numbers that should be deleted. Assuming i export this list daily/weekly into a txt file, what would be the most efficient way to delete all the files from the folders which appears in the list?

Running a for loop on every folder would take too long since there are too many files. I managed to produce a list of all the numbers to delete which have files in this folder using:

join <(cat list.txt | sort) <(ls /folder/with/0.5Mfiles | grep -v html$ | sed 's/[a-zA-Z.]*//g' | sort)

but that way I lose the original file name (e.g. z123456.jpg)

What could be the most efficient way to do it?

This sounds like an XY problem. What's in list.txt? Is that a list of filenames or a list of numbers? — glenn jackman, Sep 26 '17 at 14:15

glenn jackman · Answer 1 · 2017-09-26T15:54:56.877

0

How about

while read -r number; do
    echo rm /path/to/folder/*"$number"*
done < lists.txt

Remove "echo" if it looks OK

Yes, when number=1234 then the pattern will match a12345.jpg. Let's try this:

$ shopt -s extglob nullglob
$ touch 1234 a1234 1234b c1234d 12345 a12345 12345b c12345d
$ number=1234
$ echo ?(*[^0-9])"$number"?([^0-9]*)
1234 1234b a1234 c1234d

the ?(...) form optionally matches the contained pattern, and we use *[^0-9] and [^0-9]* to add a "number boundary" -- the preceding/following character must be a non-digit. So ?(*[^0-9]) matches either an empty string or a sequence of chars ending with a non-digit.

edited Sep 26 '17 at 15:54

answered Sep 26 '17 at 14:53

glenn jackman

238,783
38
220
352

In case 1234 will be in lists.txt - 12345.jpg will also be deleted – OTG Sep 26 '17 at 15:30
Using a while is way too slow when dealing with some many files, I'm trying to come up with a solution using xargs – OTG Sep 27 '17 at 14:33
that would be `printf "%s\0" ?(*[^0-9])"$number"?([^0-9]*) | xargs -0 rm` – glenn jackman Sep 27 '17 at 17:25

Delete multiple files based on partial names list

1 Answers1