2

The python standard library provides distutils.util.split_quoted and shlex.split.

Is there any situation in which distutils.util.split_quoted(s) gives a different result to shlex.split(s)?

Eric
  • 95,302
  • 53
  • 242
  • 374

1 Answers1

2

Yes. These algorithms disagree about the definition of whitespace: shlex hardcodes the four characters ' \t\r\n', however distutils uses string.whitespace in a regex. Therefore, it additionally considers some other characters as separators.

formfeed:

>>> distutils.util.split_quoted('A\fB')
['A', 'B']
>>> shlex.split('A\fB')
['A\x0cB']

vertical tab:

>>> distutils.util.split_quoted('A\vB')
['A', 'B']
>>> shlex.split('A\vB')
['A\x0bB']
wim
  • 338,267
  • 99
  • 616
  • 750
  • Is one "more correct" than the other? For example, does posix specify a set of whitespace chars to be used as delimiters? – Eric May 03 '19 at 01:26
  • It's complicated. Posix says to respect the IFS variable (for example, you can actually set IFS to use commas and spaces, and then `python3 myscript.py a,b` will get `["myscript.py", "a", "b"]` in `sys.argv`. Should IFS be unset, then system uses spaces and tabs only - it should *not* field split on formfeed, vertical tab, nor even on carriage return (and I have verified this on a posix-compliant system of mine, indeed \r does not split words). So, I would say neither is strictly correct but `shlex.split` is perhaps the "more correct" of the two. – wim May 03 '19 at 02:30