4

Why does the letter é count as a word boundary matching \b in the following example?

Pattern: /\b(cum)\b/i

Text: écumé

Matches 'cum' which is not desired.

Is it possible to overcome this?

marekful
  • 14,986
  • 6
  • 37
  • 59

2 Answers2

10

It will work, when you add the u modifier to your regex

/\b(cum)\b/iu
stema
  • 90,351
  • 20
  • 107
  • 135
0

To deal with unicode, replace \b with

/(?<=^|\PL)(cum)(?=\PL|$)/i
Toto
  • 89,455
  • 62
  • 89
  • 125
  • 1
    Thanks, but it seems too complex. stema's answer addresses the issue more straightforward, as the problem is that the text has unicode characters but the pattern is not made aware of this. – marekful Feb 27 '14 at 12:44