1

I have a large file system in which I have to delete certain directories from time to time. Currently I have a script which amongst other things, deletes a folder and subsequently generates an email notification. However, as the deletion of a directory can take anything from a few seconds to a few days, I would like to do this asychronously.

I can cook up a solution by say, generating little snippets like rm -rf /some/directory in the appropriate cron directory, but that might get clogged if a large number of large directories need to be deleted.

Is anyone aware of a better solution?

loris
  • 232
  • 1
  • 12

2 Answers2

0

Deleting a folder should be nearly instantaneous. It is searching the directory tree and deleting multiple files and directories which is likely the issue.

that might get clogged

I don't know what you mean by this.

If you worry that execution of a single instance may overlap with the subsequent execution, then why is that an issue? If there is a valid for reason for ensuring exclusivity of instances, then use a lock file or limit the run time with timeout.

symcbean
  • 21,009
  • 1
  • 31
  • 52
  • Yes, I am deleting large directory trees. By clogging I mean that if deletions take longer than the ```cron``` interval, the number of deletion processes running could increase in an uncontrolled manner. I'd probably want a mechanism to limit that. – loris May 17 '23 at 12:31
0

What is slowing down your deletion is not the file removal by itself (as such operations are batched in the journal and committed to the main filesystem in large chunks, so they already are async in a sense), rather the sync reads needed to discover what to delete. In other words, is the metadata traversal needed to list all the inodes to be deleted that commands the biggest hit - by far. There is no real escaping from that, unfortunately.

Some things you can do:

  • use a fast cache device to cache as many metadata as possible
  • use disposable volumes/filesystem, where "delete many files" becomes "simply discard the entire volume or filesystem"
  • schedule partial, progressive deletion via cron or similar tools

For more info about delete performance and other things which slow down file removal, you can read this answer.

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • The directories I want to delete are actually the ```home``` and ```scratch``` (on GPFS and Lustre, respectively) directories of former users of an HPC system. I don't have much latitude to tweak the basic configuration, but I am happy to just deal with the problem at the level of directories. I don't really care that deletion will take a long time, I just don't want it to delay the script which performs the other housekeeping activities associated with removing a user. I guess I'll just generate some sort of list of directories which can then be removed by a ```cron``` job. – loris May 17 '23 at 12:28