Objective:
I'm looking for a way to match or skip words based on whether or not they are surrounded by quotations marks ' '
, guillemets « »
or parentheses ( )
.
Examples of desired results:
len(re.findall("my word", "blablabla 'my word' blablabla"))
should return0
because linguistically speakingmy word
=/='my word'
and hence shouldn't be matched;len(re.findall("'my word'", "blablabla 'my word' blablabla"))
should return1
because linguistically speaking'my word'
='my word'
and hence should be matched;But here's the catch — both
len(re.findall("my word", "blablabla «my word» blablabla"))
andlen(re.findall("my word", "blablabla (my word) blablabla"))
should return1
.
My attempt:
I have the following expression (correct me if I'm wrong) at my disposal but am clueless as to how to implement it: (?<!\w)'[^ ].*?\w*?[^ ]'
I wish to make the following code len(re.findall(r'(?<!\w)'+re.escape(myword)+r'(?!\w)', sentence))
– whose aim is to strip out punctuation marks I believe – take into account all of the aforementioned situations.
For now, my code detects my word
inside of 'my word'
which is not what I want.
Thanks in advance!