2

I have a arabic keyword in a mysql table like

    *#1591; *#1610; *#1585;*#1575;*#1606

// Please consider & in the place of * , value with '&' automatically converts in to arabic.

Mysql table encoding: utf8_general_ci

I am getting some string from the external resources example twitter.

I would like to match the keyword with the tweet i am getting .

 $tweet = 'وينج وأداسي الاماراتية توقعان اتفاقية تعاون لتوفير أنظمة الطائرات بدون طيا';

  $keyword = '*#1591; *#1610; *#1585;*#1575;*#1606'; //From db

  $status = strpos ($tweet, $keyword)

$status always returns false.

I have checked with utf8_encode(), utf_8_decode() , mb_strpos() without any luck.

I know need to convert both strings to one common format before compare but which format i need to convert ?

Please help me on this.

Diego Agulló
  • 9,298
  • 3
  • 27
  • 41
Samy
  • 632
  • 4
  • 14

1 Answers1

3

As arabic symbols are encoded using multibyte characters, you must use functions that support such a constraint: grapheme_strpos and mb_strpos (in that order).

Using them instead of plain old strpos will do the trick.

Also, keep in mind that you may have to check for its availability prior to its use, as not all hosted environments have them enabled:

if (function_exists('grapheme_strpos')) {
    $pos = grapheme_strpos($tweet, $keyword);
} elseif (function_exists('mb_strpos')) {
    $pos = mb_strpos($tweet, $keyword);
} else {
    $pos = strpos($tweet, $keyword);
}

And last but not least, check the docs for the different arguments that functions take, as the encoding used by the strings.

Diego Agulló
  • 9,298
  • 3
  • 27
  • 41
  • Thanks. How to use mb_strpos('raw arabic text', 'utf8 encoded keyword') . Both strings are in different formats. – Samy Feb 18 '13 at 11:37
  • Need to convert 'raw arabic text' to utf8 before compare? because i have tried with mb_strpos also . Its not giving the desired output. – Samy Feb 18 '13 at 11:43
  • The raw arabic text is fine. You may have to convert the html entities, though: `$keyword = html_entity_decode($keyword);` (provided that you replace the asterisks with ampersands). – Diego Agulló Feb 18 '13 at 11:45
  • $tweet = 'كشف قائد ميداني بالجيش الحر ان الدول الغربيةتقوم بمساومةالجيش على مقاتلة الكتائب الإسلاميةمن أجل إمداده بالسلاح والعتاد'; $keyword = 'بمساومةالجيش'; $s = mb_strpos($tweet, html_entity_decode(trim($keyword)), 0, 'UTF-8'); var_dump($s); its always returning false, i have taken the last word from this tweet as a keyword .So that it should match but still its returning false. "كشف قائد ميداني" - This strings are keywords. – Samy Feb 18 '13 at 12:01
  • Apparently it isn't converting the character `‌​1588;`. Dump the `html_entity_decode`'d text to see it, it may help you to debug your string. Removing that character makes it work as expected! – Diego Agulló Feb 18 '13 at 12:10
  • Thanks for your valuable inputs. Still its not working.. I have removed the character ش . I have echo'd the $keyword and used my editor to check in the tweet,It shows matches but through PHP its not matching. What i am missing here ? Really its consumed my whole day still i can't able to fixe it :-( – Samy Feb 18 '13 at 12:26
  • Hi i have solved the issue.I can't able to find why the comparison fails.What i did was i just changed mysql character set to utf-8.So that i can able to store the arabic keyword without any encoding.Then i compare by using normal strpos. My server don't have the mb_strpos as well..But strpos working fine. Thanks for your help. – Samy Feb 19 '13 at 12:54