2

I am trying to convert string to seo friendly url. For this I have written below code and set the table column collation type to utf8_general_ci It is working for English but not working for Bengali Language. Just outputting single hypen(-) for bengali string

 function seo_url( $string, $separator = '-' )
 {
   $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
   $special_cases = array( '&' => 'and', "'" => '');
   $string = mb_strtolower( trim( $string ), 'UTF-8' );
   $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
   $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
   $string = preg_replace("/[^a-z0-9]/u", "$separator", $string);
   $string = preg_replace("/[$separator]+/u", "$separator", $string);
   return $string;
 }

Is there any solution for unicode like bengali language for the same

Mithu
  • 665
  • 1
  • 8
  • 38
  • 1
    Can you give us an exemple with an original string you're trying to convert please ? – Camille Dec 10 '21 at 10:18
  • for example this string "নিরাপদ সড়কের সব উদ্যোগ আটকে যাচ্ছে " If you try to convert this string to seo url using the function it will just output a single hypen(-) – Mithu Dec 10 '21 at 10:19
  • 2
    Ok thanks, so what's the result you're expecting ? I don't know Bengali but faced the same problem with rusian url, do you want some "phonetic" equivalent url ? What's the current best pratice for Bengali urls ? – Camille Dec 10 '21 at 10:23
  • Like this নিরাপদ-সড়কের-সব-উদ্যোগ-আটকে-যাচ্ছে a hypen in the gap between two words – Mithu Dec 10 '21 at 10:29
  • This is a situation when you could really use [a third-party library](https://github.com/indic-transliteration/sanscript.php). – Álvaro González Dec 10 '21 at 10:52

1 Answers1

2

To accept glyph in Bengali (or any other language) you have to change the regex on this line :

 $string = preg_replace("/[^a-z0-9]/u", "$separator", $string);

Currently, it means "change any character wich in not a letter or a number by a -". By another regex asking "change any character wich is not a letter or a number in any language" :

$string = preg_replace("/[^\p{L}\p{M}]/u", "$separator", $string);

Changing this line, your function will work fine ! More information and related anwser here : https://stackoverflow.com/a/6005511/15282066

Camille
  • 847
  • 1
  • 7
  • 19