0

Say I have a string as:

mystr = "my name is some good name"

# I want to split at white space except for the part "name is"
expectedoutput = ["my", "name is", "some", "good", "name"]

How can I do it with and without shlex?

The way I was trying to do is:

Import shlex
def careful_split(inputstr, donot_split = "name is"):
    strlex = shlex.shlex(inputstr, commenters =?, posit = ?)
    strlex.wordchars = ?
    #and other shlex function

    return list(strlex)
everestial007
  • 6,665
  • 7
  • 32
  • 72
  • What's the logic behind skipping the `name is` part? – rdas Oct 16 '19 at 15:46
  • There is no logic, except that I provide the function with list of word combinations that should not be split. e.g: `def careful_split(inputstr, donot_split = "name is"):` – everestial007 Oct 16 '19 at 15:53

1 Answers1

3

You can use a regex with negative lookahead.

import re

re.split(r'(?!name)\s+(?!is)', mystr)

An example with more cases:

>>> mystr = "my name is some good name is hi name"
>>> re.split(r'(?!name)\s+(?!is)', mystr)
['my', 'name is', 'some', 'good', 'name is', 'hi', 'name']

Note that this will not split any *.name is.* phrase. So "name isn't" will also not be split. I am not sure what your desired behavior is in these cases.

modesitt
  • 7,052
  • 2
  • 34
  • 64