1

There are other questions on SO that get close to answering mine, but I have a very specific use case that I have trouble solving. Consider this:

from asyncio import create_subprocess_exec, run


async def main():
    command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
    proc = await create_subprocess_exec(*command)
    await proc.wait()


run(main())

This causes trouble, because program.exe is called with these arguments:

['C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']

That is, the double backslash is no longer there, as shlex.split() removes it. Of course, I could instead (as other answers suggest) do this:

    proc = await create_subprocess_exec(*command, posix=False)

But then program.exe is effectively called with these arguments:

['"C:\\some folder"', '-o"\\\\server\\share\\some folder"', '"a \\"', 'quote\\""']

That's also no good, because now the double quotes have become part of the content of the first parameter, where they don't belong, even though the second parameter is now fine. The third parameters has become a complete mess.

Replacing backslashes with forward slashes, or removing quotes with regular expressions all don't work for similar reasons.

Is there some way to get shlex.split() to leave double backslashes before server names alone? Or just at all? Why does it remove them in the first place?

Note that, by themselves these are perfectly valid commands (on Windows and Linux respectively anyway):

program.exe "C:\some folder" -o"\\server\share\some folder"
echo "hello \"world""

And even if I did detect the OS and used posix=True/False accordingly, I'd still be stuck with the double quotes included in the second argument, which they shouldn't be.

Grismar
  • 27,561
  • 4
  • 31
  • 54

3 Answers3

1

For now, I ended up with this (arguably a bit of a hack):

from os import name as os_name
from shlex import split


def arg_split(args, platform=os_name):
    """
    Like calling shlex.split, but sets `posix=` according to platform 
    and unquotes previously quoted arguments on Windows
    :param args: a command line string consisting of a command with arguments, 
                 e.g. r'dir "C:\Program Files"'  
    :param platform: a value like os.name would return, e.g. 'nt'
    :return: a list of arguments like shlex.split(args) would have returned
    """
    return [a[1:-1].replace('""', '"') if a[0] == a[-1] == '"' else a
            for a in (split(args, posix=False) if platform == 'nt' else split(args))]

Using this instead of shlex.split() gets me what I need, while not breaking UNC paths. However, I'm sure there's some edge cases where correct escaping of double quotes isn't correctly handled, but it has worked for all my test cases and seems to be working for all practical cases so far. Use at your own risk.

@balmy made the excellent observation that most people should probably just use:

command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_shell(command)

Instead of

command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)

However, note that this means:

  • it's not easy to check or replace individual arguments
  • you have the problem that always comes with using create_subprocess_exec if part of your command is based on external input, someone can inject code; in the words of the documentation (https://docs.python.org/3/library/asyncio-subprocess.html):

It is the application’s responsibility to ensure that all whitespace and special characters are quoted appropriately to avoid shell injection vulnerabilities. The shlex.quote() function can be used to properly escape whitespace and special shell characters in strings that are going to be used to construct shell commands.

And that's still a problem, as quote() also doesn't work correctly for Windows (by design).

I'll leave the question open for a bit, in case someone wishes to point out why the above is a really bad idea, or if someone has a better one.

Grismar
  • 27,561
  • 4
  • 31
  • 54
0

As far as I can tell, the shlex module is the wrong tool if you are dealing with the Windows shell.

The first paragraph of the docs says (my italics):

The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell.

Admittedly, that talks about just one class, not the entire module. Later, the docs for the quote function say (boldface in the original, this time):

Warning The shlex module is only designed for Unix shells.

To be honest, I'm not sure what the non-Posix mode is supposed to be compatible with. It could be, but this is just me guessing, that the original versions of shlex parsed a syntax of its own which was not quite compatible with anything else, and then Posix mode got added to actually be compatible with Posix shells. This mailing list thread, including this mail from ESR seems to support this.

Ture Pålsson
  • 6,088
  • 2
  • 12
  • 15
  • You're not wrong to point it out, but it doesn't really help to answer the question. Another thing to point out would be that there have been attempts to write `shlex` for Windows, specifically `winshlex`, but sadly that particular effort has the same problem when it comes to UNC paths (although it solves a few others) – Grismar Nov 10 '21 at 23:07
  • @Grismar You are right — this should probably have been a comment, but it ended up much too long for that! – Ture Pålsson Nov 11 '21 at 04:48
0

For the -o parameter, but the leading " at the start of it not in the middle, and double the backslashes

Then use posix=True

import shlex

command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'

print( "Original command Posix=True", shlex.split(command, posix=True) )

command = r'program.exe "C:\some folder" "-o\\\\server\\share\\some folder" "a \"quote\""'

print( "Updated command Posix=True", shlex.split(command, posix=True) )

result:

Original command Posix=True ['program.exe', 'C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
Updated command Posix=True ['program.exe', 'C:\\some folder', '-o\\\\server\\share\\some folder', 'a "quote"']

The backslashes are still double in the result, but that's standard Python representation of a \ in a string.

  • I agree on putting the quotes outside the `-o` option, although there's some use cases where that doesn't work correctly either, that's more to do with issues with the software I'm dealing with. However, selectively having to double up on backslashes would be a new problem. The key thing is that I need to be able to process commands as they appear in batch scripts, so I have no control over something like where backslashes are doubled up and where they aren't - I just want a split that doesn't mess with the number of slashes / backslashes, but it appears that `shlex.split` won't be it. – Grismar Nov 10 '21 at 10:21
  • 1
    Why not avoid the problem by using `asyncio.create_subprocess_shell()`? – DisappointedByUnaccountableMod Nov 10 '21 at 16:23
  • @blamy that's probably the winning suggestion for most facing this problem. In my case, I'm calling someone else's code where `create_subprocess_exec` is used, so I don't have the option - but it's definitely the best suggestion. In fact, I may just duplicate the part of their code that does to override theirs, as it more or less stands alone and it may be less of a hack (though a bit more work) than trying to 'fix' `shlex.split`. However, part of what I want to do is filter out some arguments and replace specific others - that job gets a lot harder without `shlex.split`. – Grismar Nov 10 '21 at 22:32