While @Artier's answer might work acceptably, it's not the greatest idea to have loose UTF-8 combining marks in the source code and, from the bit I've gleaned from Google, they may not be covering the entire range of Arabic diacritics/combining marks.
Disclaimer: I know very little about Arabic, but I am very fussy about UTF-8.
@Artier's answer seems to have been culled from the accepted answer on this question, but the accepted answer is frequently not the optimal solution. One of these other two options from the same set of answers is likely closer to being canonically correct.
function strip_arabic_diacritics_1($str) {
return preg_replace("~[\x{064B}-\x{065B}]~u", "", $str);
}
function strip_arabic_diacritics_2($str) {
$ranges = [
"~[\x{0600}-\x{061F}]~u",
"~[\x{063B}-\x{063F}]~u",
"~[\x{064B}-\x{065E}]~u",
"~[\x{066A}-\x{06FF}]~u",
];
return preg_replace($ranges, "", $str);
}
$str="اِنَّ الَّذِیۡنَ اٰمَنُوۡا وَ عَمِلُوا الصّٰلِحٰتِ وَ اَخۡبَتُوۡۤا اِلٰی رَبِّہِمۡ ۙ اُولٰٓئِکَ اَصۡحٰبُ الۡجَنَّۃِ ۚ ہُمۡ فِیۡہَا خٰلِدُوۡنَ";
$ptr="عملوا";
var_dump(
$str,
strip_arabic_diacritics_1($str),
strip_arabic_diacritics_2($str)
);
Output:
string(265) "اِنَّ الَّذِیۡنَ اٰمَنُوۡا وَ عَمِلُوا الصّٰلِحٰتِ وَ اَخۡبَتُوۡۤا اِلٰی رَبِّہِمۡ ۙ اُولٰٓئِکَ اَصۡحٰبُ الۡجَنَّۃِ ۚ ہُمۡ فِیۡہَا خٰلِدُوۡنَ"
string(183) "ان الذیۡن اٰمنوۡا و عملوا الصٰلحٰت و اخۡبتوۡۤا الٰی ربہمۡ ۙ اولٰئک اصۡحٰب الۡجنۃ ۚ ہمۡ فیۡہا خٰلدوۡن"
string(127) "ان الذن امنوا و عملوا الصلحت و اخبتوا ال ربم اولئ اصحب الجن م فا خلدون"
As well, relying on explode()
for word splitting is generally not feasible for human-written text as it will not respect punctuation or other non-space word breaks. This is the exact use case for IntlBreakIterator
:
function strip_arabic_diacritics($str) {
return strip_arabic_diacritics_2($str);
}
$br = IntlBreakIterator::createWordInstance();
$br->setText($str);
$output = '';
$ptr_stripped = strip_arabic_diacritics($ptr);
foreach($br->getPartsIterator() as $word) {
$word_stripped = strip_arabic_diacritics($word);
if( $ptr_stripped == $word_stripped ) {
$output .= sprintf('<span class="...">%s</span>', $word);
} else {
$output .= $word;
}
}
var_dump( $output );
Output:
string(290) "اِنَّ الَّذِیۡنَ اٰمَنُوۡا وَ <span class="...">عَمِلُوا</span> الصّٰلِحٰتِ وَ اَخۡبَتُوۡۤا اِلٰی رَبِّہِمۡ ۙ اُولٰٓئِکَ اَصۡحٰبُ الۡجَنَّۃِ ۚ ہُمۡ فِیۡہَا خٰلِدُوۡنَ"
The source string looks a bit wonky because of the switches between RTL and LTR, but it should render properly.