I went to check how to remove a directory in Python, and was led to use shutil.rmtree(). It's speed surprised me, as compared to what I'd expect from a rm --recursive
. Are there faster alternatives, short of using subprocess module?

- 12,111
- 21
- 91
- 136
-
How big/deep is your directory ? Do you have a few directories with many files, or very deep hierarchies ? – David Cournapeau Mar 29 '11 at 10:24
-
@DavidCournapeau: It's a bunch of build directories, so it's quite a deep hieararchy. – tshepang Mar 29 '11 at 10:26
-
I ended up here because shutil was too slow for my use case. Talking about 10-20 directories each containing ten to fifteen thousands files. Totalling 40 GB of data (most files are text, but some are images or videos). And I have 20 backups that I must delete (800GB of data). For my use case at least, shutil is really too slow. – Adrien H May 02 '19 at 09:16
3 Answers
The implementation does a lot of extra processing:
def rmtree(path, ignore_errors=False, onerror=None):
"""Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror
is set, it is called to handle the error with arguments (func,
path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
path is the argument to that function that caused it to fail; and
exc_info is a tuple returned by sys.exc_info(). If ignore_errors
is false and onerror is None, an exception is raised.
"""
if ignore_errors:
def onerror(*args):
pass
elif onerror is None:
def onerror(*args):
raise
try:
if os.path.islink(path):
# symlinks to directories are forbidden, see bug #1669
raise OSError("Cannot call rmtree on a symbolic link")
except OSError:
onerror(os.path.islink, path, sys.exc_info())
# can't continue even if onerror hook returns
return
names = []
try:
names = os.listdir(path)
except os.error, err:
onerror(os.listdir, path, sys.exc_info())
for name in names:
fullname = os.path.join(path, name)
try:
mode = os.lstat(fullname).st_mode
except os.error:
mode = 0
if stat.S_ISDIR(mode):
rmtree(fullname, ignore_errors, onerror)
else:
try:
os.remove(fullname)
except os.error, err:
onerror(os.remove, fullname, sys.exc_info())
try:
os.rmdir(path)
except os.error:
onerror(os.rmdir, path, sys.exc_info())
Note the os.path.join()
used to create new filenames; string operations do take time. The rm(1)
implementation instead uses the unlinkat(2)
system call, which doesn't do any additional string operations. (And, in fact, saves the kernel from walking through an entire namei()
just to find the common directory, over and over and over again. The kernel's dentry
cache is good and useful, but that can still be a fair amount of in-kernel string manipulation and comparisons.) The rm(1)
utility gets to bypass all that string manipulation, and just use a file descriptor for the directory.
Furthermore, both rm(1)
and rmtree()
check the st_mode
of every file and directory in the tree; but the C implementation does not need to turn every struct statbuf
into a Python object just to perform a simple integer mask operation. I don't know how long this process takes, but it happens once for every file, directory, pipe, symlink, etc. in the directory tree.

- 374,368
- 89
- 403
- 331

- 102,305
- 22
- 181
- 238
-
3Forget the string manipulation, it's irrelevant. The other disk accesses are the speed difference. – Ned Batchelder Mar 29 '11 at 13:03
-
1not necessarily - it could be significant if the cache is hot (which is likely if the shutil.rmtree is done on a build tree just after a build). – David Cournapeau Mar 29 '11 at 13:49
-
-
1@AdrienH, indeed, the alternatives mostly involve "don't do this work in Python if performance is your top concern". Using `rsync` or `perl` can give good results: https://unix.stackexchange.com/a/79656/7064 – sarnold May 02 '19 at 17:30
If you care about speed:
os.system('rm -fr "%s"' % your_dirname)
Apart from that I did not find shutil.rmtree() much slower...of course there is extra overhead on the Python level involved. And apart from that I only believe in such a claim if you provide reasonable numbers.

- 3,595
- 2
- 26
- 53
-
By *short of using subprocess module*, I actually meant no external system calls like os.system(). – tshepang Mar 29 '11 at 10:30
-
It depends: Calling os.system() or subprocess can be much slower: If you call it often, the operating system needs to create a lot of processes and finally the python version in shutil will be faster. – guettli Aug 21 '12 at 15:12
-
3For a directory with about 15,000 small (<10KB) files (and nothing else), it was taking several minutes with no progress. Deleting it the other way was much, much faster. – Max Candocia Feb 29 '16 at 17:57
While I do not know what's wrong, you can try other methods, eg remove all the files and then try the directory
for r,d,f in os.walk("path"):
for files in f:
os.remove ( os.path.join(r,files) )
os.removedirs( r )

- 25,121
- 5
- 44
- 52
-
I thumbed up, however, `os.removedirs ( r )` removes the root and not the empty dirs right? – pebox11 Jun 22 '16 at 11:35