4

This question is not tied directly to python, but I need working implementation under python32 under windows.

Starting from this answer I assume that using shutil.rmtree() is really slow (I need to delete more than 3M files a day and it takes more than 24 hours) under windows so I wanted to use subprocess.call() and rmdir, but since I have cygwin in my %PATH% system variable wrong rmdir gets called and I'll get this:

>>> args = ['rmdir', r'D:\tmp']
>>> subprocess.call(args)
cygwin warning:
  MS-DOS style path detected: D:\tmp
  Preferred POSIX equivalent is: /cygdrive/d/tmp
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
rmdir: failed to remove `D:\\tmp': Directory not empty
1

Note: I know it's required to use /S /Q to delete folders recursively.

How can I ensure that the right rmdir is called (like under linux you would use absolute path - /bin/rm) preferably without using shell=True?

Is there alternative utility for that (something like using robocopy /MIR)?


Edit: speed comparison

I've tested different methods of deleting 237 GB (255,007,568,228 bytes) in 1,257,449 Files, 750,251 Folders using Measure-Command.

+-------------------+-------------+----------+-----------------+
|                   | rmdir /s /q |  shutil  | SHFileOperation |
+-------------------+-------------+----------+-----------------+
| Hours             |           3 |        5 |               6 |
| Minutes           |          26 |       52 |              14 |
| Seconds           |          46 |       13 |              48 |
| TotalMinutes      |         207 |      352 |             375 |
| TotalSeconds      |       12406 |    21134 |           22488 |
| TotalMilliseconds |    12406040 | 21133805 |        22488436 |
+-------------------+-------------+----------+-----------------+

Note: test was run on production server (so results may be affected)

Community
  • 1
  • 1
Vyktor
  • 20,559
  • 6
  • 64
  • 96
  • If you have cygwin in your path you could always use `rm -rf` instead. – FatalError Aug 01 '14 at 13:32
  • 1
    The error you're getting seems to be related to the directory not being empty. Windows will throw this error if there are files in it. Add the `/s` switch to make it recursive? – Tadgh Aug 01 '14 at 13:34
  • @FatalError unfortunately the script is going to run on several machines and I can't rely on them to have cygwin installed (I could check for cygwin and call cygwin and so on, but I wanted an elegant and simple solution) – Vyktor Aug 01 '14 at 13:35
  • @Tadgh I't doesn't really mater in this case, because the whole point is that invalid `rmdir` (the one from cygwin) get's called, I don't want to build a code that depends on no-one will create file `rmdir.exe` in `cwd` of the process. – Vyktor Aug 01 '14 at 13:36
  • "*I assume that using shutil.rmtree() is really slow*" - Do you assume that, or have you measured it? – Robᵩ Aug 01 '14 at 15:47
  • `win32com.shell.shell.SHFileOperation()` is a possibility. – Robᵩ Aug 01 '14 at 15:53
  • @Robᵩ I didn't measure it precisely, but when executing python script deleting 3M files took more than 24 hours (although it could be caused by the fact it's being run from Task Scheduler and possibly with lower priority), using Far Manager it took less than 24 hour to delete 9M files. And I found documentation for SHFileOperation on MSDN, but I probably didn't spent enough time digging in `win32com`. If you make answer from it (preferably with minimal example) I'll be happy to accept (unless something better shows up). – Vyktor Aug 01 '14 at 18:23
  • Hopefully someone with better Windows expertise than me will answer and flesh out the SHFileOpertions call. I'm curious: how long does it take to delete the files if you just do 'rd /s' in the CMD prompt? – Robᵩ Aug 01 '14 at 18:36
  • @Robᵩ I finally got to measure required times properly, check the results if you are interested. – Vyktor Aug 21 '14 at 06:28

4 Answers4

2

Calling proper rmdir

I've came up with an idea of calling manually cmd.exe /C directly from %SYSTEMROOT%\System32 and clearing env variables (and it seems to work):

def native_rmdir(path):
    ''' Removes directory recursively using native rmdir command
    '''

    # Get path to cmd
    try:
        cmd_path = native_rmdir._cmd_path
    except AttributeError:
        cmd_path = os.path.join(
            os.environ['SYSTEMROOT'] if 'SYSTEMROOT' in os.environ else r'C:\Windows',
            'System32', 'cmd.exe')
        native_rmdir._cmd_path = cmd_path

    # /C - cmd will terminate after command is carried out
    # /S - recursively, 
    args = [cmd_path, '/C', 'rmdir', '/S', '/Q', path]
    subprocess.check_call(args, env={})


native_rmdir(r'D:\tmp\work with spaces')

I assume that this will work under any version of windows no matter of system-wide PATH, but I would still prefer something more "elegant".

This will delete all files it can (it won't stop after the first error).


Using SHFileOperation()

It's also possible to use SHFileOperation() to do this [example source]:

from win32com.shell import shell, shellcon
shell.SHFileOperation((0, shellcon.FO_DELETE, r'D:\tmp\del', None, shellcon.FOF_NO_UI))

This will stop after the first error (when I was testing this in my environment this solution tended to be slower than shutil.rmtree(), probably because UI was involved somehow).

Vyktor
  • 20,559
  • 6
  • 64
  • 96
  • This is the solution I would have suggested and I think its elegant enough - you need a specific shell and you call it directly. – tdelaney Aug 01 '14 at 14:56
  • I find your first solution best. `rd` is an internal command and not a binary (external) so naturally `cmd.exe` is the one you have to call for it. – konsolebox Aug 04 '14 at 10:05
  • This was very helpful! Instant at least 50% time reduction from shutil.rmtree(). One important thing to note is that this raises `subprocess.CalledProcessError` if for example some of the files to be deleted are in use by another process. – Niko Föhr Mar 28 '20 at 20:06
2

Use the built-ins os.walk, os.remove and os.rmdir

The main thing to be careful about is Windows paths. Either use / as path separators instead of \, or use raw strings.

But it is probably best to use os.path.normpath on path names that you e.g. get from the command-line.

In the code that follows, topdown=False is essential.

path = os.path.normpath(path)
for root, dirs, files in os.walk(path, topdown=False):
    for f in files:
        os.remove(os.path.join(root, f))
    for d in dirs:
        os.rmdir(os.path.join(root, d))

A possible speed improvement might be to gather all the file paths in a list, and use that with multiprocessing.Pool.map() to delete the files using multiple processes. Afterwards you could then use os.removedirs to mop up the empty directories. But this solution might also overwhelm the disk subsystem.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
1

Yes, I've found this alias, but there's the same issue... If someone created rd.exe (or get's installed anywhere in the PATH variable) it won't work. It doesn't really mater in this case, because the whole point is that invalid rmdir (the one from cygwin) get's called, I don't want to build a code that depends on no-one will create file rmdir.exe in cwd of the process.

So is is the "anywhere in the path" or the current working directory that's the concern? If it's the cwd, then:

 if os.path.exists('rmdir.exe'):
     raise BadPathError("don't run this in an insecure directory")

But the underlying problems is that you are allowing this to run from a directory where someone can create rmdir.exe. Yes, Windows permissions are weak, but it's not that hard to work around.

msw
  • 42,753
  • 9
  • 87
  • 112
0

As documented here, it seems that rmdir has an alias, rd . I'm unable to test it, but you could try this.

>>> args = ['rd', r'D:\tmp', '/s', '/q']
>>> subprocess.call(args)

There might be some restrictions on removing hidden files or system files - again I am unable to test it.

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Yes, I've found this alias, but there's the same issue... If someone created `rd.exe` (or get's installed anywhere in the `PATH` variable) it won't work. – Vyktor Aug 01 '14 at 13:58
  • Possibly I am missing the point, but why would someone create `rd.exe`? I can understand `rmdir` being a problem for users with cygwin installed, but `rd` is not a cygwin command. Is `shutil.rmtree()` really that slow in your circumstance? – mhawke Aug 01 '14 at 14:08
  • It has to delete more than 3M files a day and on batch that size it takes more than 24 hours. – Vyktor Aug 01 '14 at 14:12