I want to find out the language used from a web page. Here I guess based on some of the words that come in the keyword list.
This script I get from http://www.kangsigit.com/2017/08/php.deteksi-bahasa.html
How this code works is just matching words to the "INDONESIAN and ENGLISH"
keyword list. If one of your keywords comes in, then that's the language detected.
The code:-
$tulisan = "Hari ini saya dapat senyum oleh suatu hal";
function Bahasa($tulisan, $terjemahkan) {
$bahasa_pilihan = array('INDONESIAN','ENGLISH');
$katakunci['INDONESIAN'] = array ('cinta', 'marah', 'sayang', 'benci', 'senyum', 'peluk');
$katakunci['ENGLISH'] = array ('the', 'and', 'have', 'for', 'with', 'you');
$tulisan = preg_replace("/[^A-Za-z]/", ' ', $tulisan);
foreach ($bahasa_pilihan as $bahasa) {
$kalkulasi[$bahasa]=0;
}
for ($i = 0; $i < 6; $i++) {
foreach ($bahasa_pilihan as $bahasa) {
$kalkulasi[$bahasa] = $kalkulasi[$bahasa] +
substr_count($tulisan, ' ' .$katakunci[$bahasa][$i] . ' ');;
}
}
$max = max($kalkulasi);
$maxs = array_keys($kalkulasi, $max);
if (count($maxs) == 1) {
$pemenang = $maxs[0];
$pertamax = 0;
foreach ($bahasa_pilihan as $bahasa) {
if ($bahasa <> $pemenang) {
if ($kalkulasi[$bahasa]>$pertamax) {
$pertamax = $kalkulasi[$bahasa];
}
}
}
if (($pertamax / $max) < 0.1) {
return $pemenang;
}
}
return $terjemahkan;
}
echo Bahasa($tulisan, $terjemahkan);
But there is a problem here.
If the keyword "INDONESIAN and ENGLISH"
enters all, then the script becomes error.
An example is changed like this:
$tulisan = "Hari ini saya dapat senyum oleh suatu hal, you know?";
The two words "senyum"
, and "you"
come from different keywords. Generate error.
Is there a way to fix it?
UPDATE:
If in Indonesian there are 2 words, and English is only one word, then the Indonesian language is the winner. But the code above does not work as I expected.
For example:
$tulisan = "Hari ini saya cinta dan dapat senyum oleh suatu hal, you know?";
There are two words from the Indonesian language, namely (cinta
and senyum
).
There is one word from English, that is (you
).
So it should be, the detected language is INDONESIA
.