I'm looking for php chinese segmentation
because Chinese words don't have space, it affect fulltext search
ex.
$_GET['text']="中文分詞搜尋";
$text=$_GET['text'];(user's input)
$text; -chinese segment function-> $text="中文 分詞 搜尋";(result)
I'm looking for php chinese segmentation
because Chinese words don't have space, it affect fulltext search
ex.
$_GET['text']="中文分詞搜尋";
$text=$_GET['text'];(user's input)
$text; -chinese segment function-> $text="中文 分詞 搜尋";(result)
It's extremely easy to find such libraries if you ask Google.
To ensure performance, mostly the kernel algorithm is implemented in native language like C/C++.
Also there's one based on RESTful api (with php interface):
A pure php implementation (may be slow):
A online web service, with php client driver.
Try:
<?php
$str = '蚂蚁学院,欢迎您的光临!';
function mbstringtoarray($str,$charset) {
$strlen=mb_strlen($str);
while($strlen){
$array[]=mb_substr($str,0,1,$charset);
$str=mb_substr($str,1,$strlen,$charset);
$strlen=mb_strlen($str);
}
return $array;
}
$arr = mbstringtoarray($str,"gbk");
print_r($arr);
?>
The Output will be:
Array
(
[0] => 蚂
[1] => 蚁
[2] => 学
[3] => 院
[4] => ,
[5] => 欢
[6] => 迎
[7] => 您
[8] => 的
[9] => 光
[10] => 临
[11] => !
)
you cant divide it exactly word by word: 蚂蚁/学院/欢迎/您/的/光临
if you insist, you will need another table that to store these meaning full word, as php wont recognized it by default