i modifier doesn't work with foreign languages?

Question

My string is in a foreign language. I use the following regular expression:

$str = 'մի քանի Բառ ձեր մասին';
$word = 'բառ';

$cont = preg_match_all("/.{0,80}[^\s]*?".preg_quote($word)."[^\s]*?.{0,80}/si",$str,$matched);
print_r($matched);//returns Array ( [0] => Array ( ) ) ..

.

...but if I set:

$word = "Բառ";//returns Array ( [0] => Array ( [0] => մի քանի Բառ ձեր մասին ) )

What can I do to be able to use I modifier in foreign languages too?

What is the purpose of this code? Are you trying to extract the word from a text plus their surrounding words? — Gumbo, Aug 24 '10 at 09:54
@Gumbo exactly. i try to extract word and surrounding words if there is even a subword in the string. what you think about such solution? — Simon, Aug 24 '10 at 09:57
I would rather split the text into words, find the words that are or contain the wanted word, and then get the surrounding words. Or if you want to use `preg_match_all`, just search for the wanted word and use the `PREG_OFFSET_CAPTURE` flag to get the offsets for `substr` (see http://stackoverflow.com/questions/3306513). — Gumbo, Aug 24 '10 at 10:09
ok, but as i know, if i use `PREG_OFFSET_CAPTURE` to get the ofsets too, it will return ampty result if mthe offset is less then i mention. am i correct? ie, if i set `offset=30` but there is only 29 characters, it will return empty result? — Simon, Aug 24 '10 at 10:16

score 5 · Accepted Answer · answered Aug 24 '10 at 09:51

5

Try adding the u modifier:

$cont = preg_match_all("/.{0,80}[^\s]*?".preg_quote($word)."[^\s]*?.{0,80}/siu",$str,$matched);

answered Aug 24 '10 at 09:51

Alix Axel

151,645
95
393
500

3

perfect. Thanks much. Could you explain why u modifier has influance on language? as i know, it inverts the greediness only? – Simon Aug 24 '10 at 09:55
4

@Syom: `U` (uppercase) is for an non-greedy match and `u` (lowercase) is for interpreting the pattern as UTF-8 encoded. See http://php.net/reference.pcre.pattern.modifiers. – Gumbo Aug 24 '10 at 10:14

i modifier doesn't work with foreign languages?

1 Answers1