2

I have a need to split a series of strings into 3 component parts denoted by spaces. These strings sometimes contain sublists, but always as the last component of the string.

I was previously using Shlex to achieve this with great success, but I'm not getting the desired results anymore as my most recent sub lists contain spaces of their own and that seems to throw Shlex off.

Is there an alternative to Shlex that might perform the task better?

Some examples are:

'BREAKFAST IN ["Rolled Oats","Cornflakes","French Toast"]'

and

COPIES_FOR_EXTERNAL > "0"

Which should become lists like:

['BREAKFAST','IN', '["Rolled Oats","Cornflakes","French Toast"]']

and

['COPIES_FOR_EXTERNAL','>','"0"']
danspants
  • 3,099
  • 3
  • 27
  • 33
  • shlex does shell-like parsing. The results you're getting are, well, what a shell would do (inasmuch as word-splitting behavior is concerned). Which is to say that the behavior you want from it is far, *far* outside its intended use case and specification. – Charles Duffy Apr 29 '16 at 01:18
  • Yeah, that's what I figured, though it did a great job before the added complexity came along. I'm thinking a regex might be the best solution. – danspants Apr 29 '16 at 01:20
  • Personally, I tend to be in the "build-a-real-parser" camp. Sure, it's more work and more code, but what you get out of it is considerably more rigorously defined (and is capable of handling contextual rule changes and other anomalies that require hackery to be dealt with in regexes). – Charles Duffy Apr 29 '16 at 01:22
  • ...https://wiki.python.org/moin/LanguageParsing has a huge list of parser generators; personally, if starting something new in Python today without requirements driving things one way or the other, I'd likely go with http://igordejanovic.net/Arpeggio/. – Charles Duffy Apr 29 '16 at 01:25

1 Answers1

1

Since you know the number of components and that the sublist is always a last element you can use str.split with maxsplit parameter:

s1 = 'BREAKFAST IN ["Rolled Oats","Cornflakes","French Toast"]'
s2 = 'COPIES_FOR_EXTERNAL > "0"'

print s1.split(None, 2) # ['BREAKFAST', 'IN', '["Rolled Oats","Cornflakes","French Toast"]']
print s2.split(None, 2) # ['COPIES_FOR_EXTERNAL', '>', '"0"']
niemmi
  • 17,113
  • 7
  • 35
  • 42
  • So obvious. Now of course I'm worried about why the original developer used Shlex... what did he know that I did not... I'm sure something will rear it's ugly head in the future but until then this will do the trick! – danspants Apr 29 '16 at 03:27