1

I have a script containing about 420k lines of "rm -rf" command like which where generated using a "find" statement. Each pdf's folder is containing between 1 and 30 files (no subfolder).

rm -rf /2012/128/211503/pdf
rm -rf /2012/128/212897/pdf
rm -rf /2012/128/211989/pdf
rm -rf /2012/128/211691/pdf
rm -rf /2012/128/212539/pdf
rm -rf /2012/218/358976/pdf
rm -rf /2012/218/358275/pdf
rm -rf /2012/218/358699/pdf

I'm searching how to increase the deletion speed of the script.

Currently, vmstat report only about (IO) wait time.

Platform is RHEL 5 deleting files on a RAID5/6 drive using ext3 and LVM.

I thought about splitting the script file into smaller files (like 10 files) in order to trigger several script in parallel but here I'm spotting a hardware speed limitation.

Would that be a good idea if the commitment of the deletion for the journalization taking time and could it take part of feature like NCQ ?

DevOps
  • 720
  • 5
  • 16
  • 2
    You can use find to delete files it will be faster then first generate script with find and then run the script that actualy deletes the files – B14D3 Aug 22 '13 at 13:09
  • 2
    @B14D3 You could probably make a real answer of that, with a little expansion. – Michael Hampton Aug 22 '13 at 13:27

1 Answers1

1

If you're using find to generate the script you should take a look at the -delete action

Delete files; true if removal succeeded. If the removal failed, an error message is issued. If -delete fails, find's exit status will be nonzero (when it eventually exits). Use of -delete automatically turns on the -depth option

You could use split to break up the file into chunks. You may get some milage out of GNU Parallel too.

user9517
  • 115,471
  • 20
  • 215
  • 297
  • If fact, we are using find to generate the script to check that the list of folder is not containing unwanted items. We know how to **split** the files into smaller chunks but thanks for advice. Here GNU Parallel doesn't seem to be relevant. Question was more about hints to reduce each command execution time as a group like if there as some commitment in the OS regarding the deletion of one files. – DevOps Aug 22 '13 at 15:35
  • Like one deletion taking 1sec to delete at the disk and 1 sec to commit at the ext3 journal `so to delete 5 files one after another one it would take (1+1)*5 = 10 sec` but in parallel mode like in UC execution pipes two files at the same time would lead to `1 sec for IO first file then 1 sec for commitment of the first file while also taking 1 sec for the IO of the second file then 1 sec for commitment of the second file so here taking only 3sec for two files.` We were wondering if it would be relevant to use and if gain would just be negligible or even worst due to concurrent access. – DevOps Aug 22 '13 at 15:44
  • 1
    @TheCodeKiller: Group lots of files onto the same rm command line/ – user9517 Aug 22 '13 at 16:09