3

Brief

Help me to create a new function or change the function correct() so that the result works in a case-insensitive manner for the input text.


Example

Usage

Example usage for the correct() method:

$text = "Точик ТОЧИК точик ТоЧиК тоЧИК";

$text = correct($text, $base_words);
echo "$text";

Expected Result

Input: Точик ТОЧИК точик ТоЧиК тоЧИК
Output: Тоҷик ТОҶИК тоҷик ТоҶиК тоҶИК


Code

Here are all the arrays and functions below so you can easily copy them:

$default_words = array
(
    'бур',
    'кори',
    'давлати',
    'забони',
    'фанни'
);

$base_words = array
(
    "точик"    => "тоҷик",
    "точики"   => "тоҷики",
    "точикон"  => "тоҷикон",
    "чахонгир" => "ҷаҳонгир",
    "галат"    => "ғалат",
    "уктам"    => "ӯктам",
);

$base_special_words = array
(
    "кори хатти"     => "кори хаттӣ",
    "хатти аз"       => "хаттӣ аз",
    "забони точики"  => "забони тоҷикӣ",
    "точики барои"   => "тоҷикӣ барои",
    "забони давлати" => "забони давлатӣ",
    "давлати дар"    => "давлатӣ дар",
    "микёси чахони"  => "миқёси ҷаҳонӣ",
);


function correct($request, $dictionary)
{
    $search  = array("ғ","ӣ","ҷ","ҳ","қ","ӯ","Ғ","Ӣ","Ҷ","Ҳ","Қ","Ӯ");
    $replace = array("г","и","ч","х","к","у","Г","И","Ч","Х","К","У");
    $request = str_replace($search, $replace, $request); // replace special letters to default cyrillic letters

    $result = preg_replace_callback("/\pL+/u", function ($m) use ($dictionary) {
    $word = mb_strtolower($m[0]);
    if (isset($dictionary[$word])) {
        $repl = $dictionary[$word];
        // Check for some common ways of upper/lower case
        // 1. all lower case
        if ($word === $m[0]) return $repl;
        // 2. all upper case
        if (mb_strtoupper($word) === $m[0]) return mb_strtoupper($repl);
        // 3. Only first letters are upper case
        if (mb_convert_case($word,  MB_CASE_TITLE) === $m[0]) return mb_convert_case($repl,  MB_CASE_TITLE);
        // Otherwise: check each character whether it should be upper or lower case
        for ($i = 0, $len = mb_strlen($word); $i < $len; ++$i) {
            $mixed[] = mb_substr($word, $i, 1) === mb_substr($m[0], $i, 1) 
                ? mb_substr($repl, $i, 1)
                : mb_strtoupper(mb_substr($repl, $i, 1));
        }
        return implode("", $mixed);
    }
    return $m[0]; // Nothing changes
    }, $request);


    return $result;
}

Questions

How do I properly correct the input text?

Input
Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони.
Output
Кори хаттӣ аз фанни забони тоҷикӣ барои забони давлатӣ дар миқёси ҷаҳонӣ.

Here, most likely, you need to fix the text step by step using 3 arrays. My algorithm did not give suitable results. And so I created an array that consists of two words ($base_special_words).

My algorithm corrects sentence by words from the dictionary:

Step 1.

You need to create a temp array from the elements of the $base_special_words array from those words that occur in the sentence. The temp array looks like this:

$temp_for_base_special_words = array
(
    "кори хатти",
    "хатти аз",
    "забони точики",
    "точики барои",
    "забони давлати",
    "давлати дар",
    "микёси чахони",   
);

All these words meet in the sentence. Then we cut out those words that are in the temp array. After cutting out those words from the sentence, the sentence looks like this:

Full sentence before cutting:
Кори хатти аз фанни забони точики барои забони давлати дар микёси чахони. Точик мард аст.
Cutted part of sentence:
Кори хатти аз забони точики барои забони давлати дар микёси чахони
Sentence after cutting:
фанни. Точик мард аст.

Step 2.

Then the remaining part of the sentence will be checked with the array $default_words and the words that are in this array from the sentence are cut.

Sentence before cutting in step 2:
фанни. Точик мард аст.
Cutted part:
фанни
Sentence after cutting:
. Точик мард аст.
Array with cutted words:
$temp_for_default_words = array("фанни");

Step 3.

Cut those words from the rest of the sentence that are available in the $base_words array.

Sentence before cutting in step 3:
. Точик мард аст.
Cutted part:
Точик
Sentence after cutting:
. мард аст.
Array with cutted words:
$temp_for_base_words = array ("точик");

The rest of the offer must be temporarily cut and hidden so that there is no treatment with it.

Sentence part for hidden:
. мард аст.

And in the end, you need to replace using three new arrays using the dictionary and return the hidden part.

Correcting step

Step 1.

Usage `$temp_for_base_special_words`:


Using $temp_for_base_special_words values for find values for with keys( $temp_for_base_special_words[$value]) in $base_special_words with and replace that keys to value in input text.

Step 2.

Usage `$temp_for_default_words`:


Using $temp_for_default_words values for find values for with keys( $temp_for_default_words[$value]) in $base_default_words with and replace that keys to value in input text.

Step 3.

Usage `$temp_for_default_words`:


Using $temp_for_base_words values for find values for with keys( $temp_for_base_words[$value]) in $base_words with and replace that keys to value in input text.

Step 4.

Return hidden part of text to input coordinates
John
  • 468
  • 3
  • 16
  • I changed my question @Wiktor Stribiżew. This is my project for correcting incorrect typed user words from form using my example dict. – John Oct 24 '17 at 15:56
  • Can I get needed me result? @Wiktor Stribiżew – John Oct 24 '17 at 16:00
  • You could always restrict the initial input to a set of characters but in terms of replacing, a better option might be to use an associative array where the keys (characters that are to be replaced) are given values (the value to replace it with) so that you end up with something like `['ғ' => 'г']` (obviously with all the other values) and then use a foreach loop to replace instances of the key with the value. This doesn't even require regex as you can use `str_ireplace()`, a case-insensitive string replacement function built into PHP. If you do care about case, use `str_replace()` instead – ctwheels Oct 24 '17 at 16:06
  • To formulate the wrong version of the word, I delete the special letters and then pass to the key of the array the non-correct version of the word a in the value of the rule of the word. Well, on the contrary, the key could be the rule version and the value is not a valid version of the words that are passed. But I do not make sense in this. I think the result will be the same. @ctwheels – John Oct 24 '17 at 16:12
  • Does this function work correctly in my case with Unicode? And returns the input text correcting in the input register? For example, the input text: "тоЧИК" and in the dictionary there are the words "тоҷик". `str_replace()` or `str_ireplace()` can return the result in this form: "тоҶИК"? Can you show it with a demonstration? @ctwheels – John Oct 24 '17 at 16:19

1 Answers1

0

What @ctwheels wanted to tell you is to use str_ireplace (documentation), if you want to correct word with case-insensitive.

<?php
     $test="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
     $word=explode(" ",$test); //This function is need for take all the words individually, the link of the function is above
     foreach($word as $key=>$value)
        if (array_key_exists($value,$YourArrayWithCorrectWord))
            $word[$key]=$YourArrayWithCorrectWord[$value]; //This, if i don't make mistakes, take the correct word and assigns to the wrong word.

     $TestCorrect=implode(" ",$word);
?>

If there is something that you don't understand, write me.

I hope I have helped you.

Documentation: Here the documentation of explode

Here the documentation of implode

Here the documentation of array_key_exsist

P.S. This method have the problem that you can't correct two or more words together.

L. Ros.
  • 114
  • 1
  • 2
  • 9