After trying to figure how to have an effective word counter of a string, I know about the existing function that PHP has str_word_count
but unfortunately it doesn't do what I need it to do because I will need to count the number of words that includes English, Chinese, Japanese and other accented characters.
However str_word_count
fails to count the number of words unless you add the characters in the third argument but this is insane, it could mean I have to add every single character in the Chinese, Japanese, accented characters (etc) language but this is not what I need.
Tests:
str_word_count('The best tool'); // int(3)
str_word_count('最適なツール'); // int(0)
str_word_count('最適なツール', 0, '最ル'); // int(5)
Anyway, I found this function online, it could do the job, but sadly it fails to count:
function word_count($str)
{
if($str === '')
{
return 0;
}
return preg_match_all("/\p{L}[\p{L}\p{Mn}\p{Pd}'\x{2019}]*/u", $str);
}
Tests:
word_count('The best tool') // int(3)
word_count('最適なツール'); // int(1)
// With spaces
word_count('最 適 な ツ ー ル'); // int(5)
Basically I'm looking for a good UTF-8 supported word counter that can count words from every typical word/accented/language symbols - is there a possible solution to this?