Note: I am not familiar with the Japanese writing system.
Looking at the function the iconv
call appears to remove all the Japanese characters. Instead of using iconv
to transliterate, it may be easier to just create a function that does it:
function _toSlugTransliterate($string) {
// Lowercase equivalents found at:
// https://github.com/kohana/core/blob/3.3/master/utf8/transliterate_to_ascii.php
$lower = [
'à'=>'a','ô'=>'o','ď'=>'d','ḟ'=>'f','ë'=>'e','š'=>'s','ơ'=>'o',
'ß'=>'ss','ă'=>'a','ř'=>'r','ț'=>'t','ň'=>'n','ā'=>'a','ķ'=>'k',
'ŝ'=>'s','ỳ'=>'y','ņ'=>'n','ĺ'=>'l','ħ'=>'h','ṗ'=>'p','ó'=>'o',
'ú'=>'u','ě'=>'e','é'=>'e','ç'=>'c','ẁ'=>'w','ċ'=>'c','õ'=>'o',
'ṡ'=>'s','ø'=>'o','ģ'=>'g','ŧ'=>'t','ș'=>'s','ė'=>'e','ĉ'=>'c',
'ś'=>'s','î'=>'i','ű'=>'u','ć'=>'c','ę'=>'e','ŵ'=>'w','ṫ'=>'t',
'ū'=>'u','č'=>'c','ö'=>'o','è'=>'e','ŷ'=>'y','ą'=>'a','ł'=>'l',
'ų'=>'u','ů'=>'u','ş'=>'s','ğ'=>'g','ļ'=>'l','ƒ'=>'f','ž'=>'z',
'ẃ'=>'w','ḃ'=>'b','å'=>'a','ì'=>'i','ï'=>'i','ḋ'=>'d','ť'=>'t',
'ŗ'=>'r','ä'=>'a','í'=>'i','ŕ'=>'r','ê'=>'e','ü'=>'u','ò'=>'o',
'ē'=>'e','ñ'=>'n','ń'=>'n','ĥ'=>'h','ĝ'=>'g','đ'=>'d','ĵ'=>'j',
'ÿ'=>'y','ũ'=>'u','ŭ'=>'u','ư'=>'u','ţ'=>'t','ý'=>'y','ő'=>'o',
'â'=>'a','ľ'=>'l','ẅ'=>'w','ż'=>'z','ī'=>'i','ã'=>'a','ġ'=>'g',
'ṁ'=>'m','ō'=>'o','ĩ'=>'i','ù'=>'u','į'=>'i','ź'=>'z','á'=>'a',
'û'=>'u','þ'=>'th','ð'=>'dh','æ'=>'ae','µ'=>'u','ĕ'=>'e','ı'=>'i',
];
return str_replace(array_keys($lower), array_values($lower), $string);
}
So, with some modifications, it could look something like this:
function toSlug($string, $separator = '-') {
// Work around this...
#$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$string = _toSlugTransliterate($string);
// Remove unwanted chars + trim excess whitespace
// I got the character ranges from the following URL:
// https://stackoverflow.com/questions/6787716/regular-expression-for-japanese-characters#10508813
$regex = '/[^一-龠ぁ-ゔァ-ヴーa-zA-Z0-9a-zA-Z0-9々〆〤.+ -]|^\s+|\s+$/u';
$string = preg_replace($regex, '', $string);
// Using the mb_* version seems safer for some reason
$string = mb_strtolower($string);
// Same as before
$string = preg_replace("/[ {$separator}]+/", $separator, $string);
return $string;
}
$x = ' æøå!this.ís-a test-ゔヴ ーァ ';
echo toSlug($x);
In regex you can use unicode "scripts" to match letters of various languages. There is no "Japanese" one, but there are Hiragana
, Katakana
and Han
. As I have no idea how Japanese is written, and how one could use these, I am not even going to try.
Using these scripts, however, would be done something like this:
'/[\p{Hiragana}\p{Katakana}\p{Han}]+/'