RegEx match word in string containing + and - using re.findall() Python

Question

myreg = r"\babcb\"

mystr = "sdf ddabc"

mystr1 = "sdf abc"

print(re.findall(myreg,mystr))=[]

print(re.findall(myreg,mystr1))=[abc]

Until now everything works as expected but if i change my reg and my str to.

myreg = r"\b\+abcb\"

mystr = "sdf +abc"

print(re.findall(myreg,mystr)) = [] but i would like to get [+abc]

I have noticed that using the following works as expected.

   myreg = "^\\+abc$"

   mystr = "+abc"   

   mystr1 = "-+abc"

My question: Is it possible to achieve the same results as above without splitting the string?

Best regards,

Gabriel

zx81 · Answer 1 · 2014-06-13T23:22:46.553

There are two problems

Before your + in +abc, there is no word boundary, so \b cannot match.
Your regex \b\+abcb\ tries to match a literal b character after abc (typo).

Word Boundaries

The word boundary \b matches at a position between a word character (letters, digits and underscore) and a non-word character (or a line beginning or ending). For instance, there is a word boundary between the + and the a

Solution: Make your Own boundary

If you want to match +abc but only when it is not preceded by a word character (for instance, you don't want it inside def+abc), then you can make your own boundary with a lookbehind:

(?<!\w)\+abc

This says "match +abc if it is not preceded by a word character (letter, digit, underscore)".

score 0 · Answer 2 · answered Jun 13 '14 at 23:06

Your problem is the following:

\b is defined as the boundary between a \w and a \W character (or vice versa).
\w contains the character set [a-zA-Z0-9_]
\W contains the character set [^a-zA-Z0-9_], which means all characters except [a-zA-Z0-9_]

'+' is not contained in \w so you won't match the boundary between the whitespace and the '+'.

To get what you want, you should remove the first \b from your pattern:

import re

string = "sdf +abc"
pattern = r"\+abc\b"
matches = re.findall(pattern, string)

print matches
['+abc']

RegEx match word in string containing + and - using re.findall() Python

2 Answers2