0

myreg = r"\babcb\"

mystr = "sdf ddabc"

mystr1 = "sdf abc"

print(re.findall(myreg,mystr))=[]

print(re.findall(myreg,mystr1))=[abc]

Until now everything works as expected but if i change my reg and my str to.

myreg = r"\b\+abcb\"

mystr = "sdf +abc"

print(re.findall(myreg,mystr)) = [] but i would like to get [+abc]

I have noticed that using the following works as expected.

   myreg = "^\\+abc$"

   mystr = "+abc"   

   mystr1 = "-+abc"

My question: Is it possible to achieve the same results as above without splitting the string?

Best regards,

Gabriel

gabrielK
  • 19
  • 1
  • 2

2 Answers2

0

There are two problems

  1. Before your + in +abc, there is no word boundary, so \b cannot match.
  2. Your regex \b\+abcb\ tries to match a literal b character after abc (typo).

Word Boundaries

The word boundary \b matches at a position between a word character (letters, digits and underscore) and a non-word character (or a line beginning or ending). For instance, there is a word boundary between the + and the a

Solution: Make your Own boundary

If you want to match +abc but only when it is not preceded by a word character (for instance, you don't want it inside def+abc), then you can make your own boundary with a lookbehind:

(?<!\w)\+abc

This says "match +abc if it is not preceded by a word character (letter, digit, underscore)".

zx81
  • 41,100
  • 9
  • 89
  • 105
0

Your problem is the following:

  • \b is defined as the boundary between a \w and a \W character (or vice versa).
  • \w contains the character set [a-zA-Z0-9_]
  • \W contains the character set [^a-zA-Z0-9_], which means all characters except [a-zA-Z0-9_]

'+' is not contained in \w so you won't match the boundary between the whitespace and the '+'.

To get what you want, you should remove the first \b from your pattern:

import re

string = "sdf +abc"
pattern = r"\+abc\b"
matches = re.findall(pattern, string)

print matches
['+abc']
miindlek
  • 3,523
  • 14
  • 25