0

I have a folder one my Red Hat server with approx. 500k files from various extensions. The name convention for those files is based on a number, for example:

  • a123456.csv
  • z123456.jpg
  • 123456.exe
  • a234.jpg
  • 234.exe

I designed a query which produce a list of all the numbers that should be deleted. Assuming i export this list daily/weekly into a txt file, what would be the most efficient way to delete all the files from the folders which appears in the list?

Running a for loop on every folder would take too long since there are too many files. I managed to produce a list of all the numbers to delete which have files in this folder using:

join <(cat list.txt | sort) <(ls /folder/with/0.5Mfiles | grep -v html$ | sed 's/[a-zA-Z.]*//g' | sort)

but that way I lose the original file name (e.g. z123456.jpg)

What could be the most efficient way to do it?

OTG
  • 1

1 Answers1

0

How about

while read -r number; do
    echo rm /path/to/folder/*"$number"*
done < lists.txt

Remove "echo" if it looks OK


Yes, when number=1234 then the pattern will match a12345.jpg. Let's try this:

$ shopt -s extglob nullglob
$ touch 1234 a1234 1234b c1234d 12345 a12345 12345b c12345d
$ number=1234
$ echo ?(*[^0-9])"$number"?([^0-9]*)
1234 1234b a1234 c1234d

the ?(...) form optionally matches the contained pattern, and we use *[^0-9] and [^0-9]* to add a "number boundary" -- the preceding/following character must be a non-digit. So ?(*[^0-9]) matches either an empty string or a sequence of chars ending with a non-digit.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352