I've been trying to figure out this regex in Python but it's not been producing the expected result.
I have a text file which I load that is in the format of:
"18 75 19\n!dont split here\n! but split here\n* and split here"
I'd like to get the following output:
['18 75 19\n!dont split here',
'! but split here',
'* and split here']
I'm trying to split my string by either 1) a new line followed by a number, or 2) a new line followed by a special character only if it is followed by a space (e.g. '! but split here', but not '!dont split here').
Here's what I have so far:
re.split(u'\n(?=[0-9]|([`\-=~!@#$%^&*()_+\[\]{};\'\\:"|<,./<>?])(?= ))', str)
This is close, but not there yet. Here's the output it produces:
['18 75 19\n!dont split here', '!', '! but split here', '*', '* and split here']
It incorrectly matches the special character separately: '!' and '*' have their own element. There are two lookahead operators in the regex.
I'd really appreciate if you could help identify what I could change with this regex for it to not match the single special character, and just match the special character followed by the full line.
I'm also open to alternatives. If there's a better way that doesn't involve two lookaheads, I'd also be interested to understand other ways to tackle this problem.
Thanks!