1

My string is in a foreign language. I use the following regular expression:

$str = 'մի քանի Բառ ձեր մասին';
$word = 'բառ';

$cont = preg_match_all("/.{0,80}[^\s]*?".preg_quote($word)."[^\s]*?.{0,80}/si",$str,$matched);
print_r($matched);//returns Array ( [0] => Array ( ) ) ..

.

...but if I set:

$word = "Բառ";//returns Array ( [0] => Array ( [0] => մի քանի Բառ ձեր մասին ) )  

What can I do to be able to use I modifier in foreign languages too?

Makoto
  • 104,088
  • 27
  • 192
  • 230
Simon
  • 22,637
  • 36
  • 92
  • 121
  • What is the purpose of this code? Are you trying to extract the word from a text plus their surrounding words? – Gumbo Aug 24 '10 at 09:54
  • @Gumbo exactly. i try to extract word and surrounding words if there is even a subword in the string. what you think about such solution? – Simon Aug 24 '10 at 09:57
  • I would rather split the text into words, find the words that are or contain the wanted word, and then get the surrounding words. Or if you want to use `preg_match_all`, just search for the wanted word and use the `PREG_OFFSET_CAPTURE` flag to get the offsets for `substr` (see http://stackoverflow.com/questions/3306513). – Gumbo Aug 24 '10 at 10:09
  • ok, but as i know, if i use `PREG_OFFSET_CAPTURE` to get the ofsets too, it will return ampty result if mthe offset is less then i mention. am i correct? ie, if i set `offset=30` but there is only 29 characters, it will return empty result? – Simon Aug 24 '10 at 10:16

1 Answers1

5

Try adding the u modifier:

$cont = preg_match_all("/.{0,80}[^\s]*?".preg_quote($word)."[^\s]*?.{0,80}/siu",$str,$matched);
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • 3
    perfect. Thanks much. Could you explain why u modifier has influance on language? as i know, it inverts the greediness only? – Simon Aug 24 '10 at 09:55
  • 4
    @Syom: `U` (uppercase) is for an non-greedy match and `u` (lowercase) is for interpreting the pattern as UTF-8 encoded. See http://php.net/reference.pcre.pattern.modifiers. – Gumbo Aug 24 '10 at 10:14