0

Given the string:

I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' .

The goal is to catch:

'v 
'v
'w

But avoid 've and 'll and 't.

I've tried to catch the 've and 'll and 't with (?i)\'(?:ve|ll|t)\b , e.g.

>>> import re
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> pattern = r"(?i)\'(?:ve|ll|t)\b"
>>> re.findall(pattern, x)
["'ll", "'ve", "'t"]

But I've also tried to negate the non-capturing group in (?i)\'(?:ve|ll|t)\b like this (?i)\'[^(?:ve|ll|t)]\b but it didn't catch the 'v and 'w that is the desired goal.

How do I catch the substrings that follows the single quote but isn't from a list of pre-defined substring, i.e. 'll, 've and 't ?


I've tried this too that didn't work:

pattern = "(?i)\'(?:[^ve|ll|t|\s])\b"

but the [^...] only recognize single character and not substrings.

alvas
  • 115,346
  • 109
  • 446
  • 738

2 Answers2

2

Maybe this one will work?

\'(?!ve|ll|t|\s)\w+

You can use lookahead assertion to filter what you don't want.

update

In some other languages, the pattern lookahead assert must be fixed length.

That means (?!ve|t) is invalid as ve and t have two different length.

Community
  • 1
  • 1
Sraw
  • 18,892
  • 11
  • 54
  • 87
0

The negative lookahead for non-capturing group is (?!...), so it's something like (?i)\'(?!ve|ll|t)\w\b:

>>> pattern = r"(?i)\'(?!ve|ll|t)\w\b"
>>> x = "I'll be going home I've the 'v ' isn't want I want to split but I want to catch tokens like 'v and 'w ' ."
>>> re.findall(pattern, x)
["'v", "'v", "'w"]
alvas
  • 115,346
  • 109
  • 446
  • 738