2

Summary

I am trying to find name matching percentage in php but before that I need to rearrange the words in string according to 1st string.

What is the source code about?

I have two strings. First I am adding both strings to array if space is found in string add it into array. $arraydataBaseName and $arraybankData from my first array i.e $arraydataBaseName I am searching all the values of $arraybankData and getting the Key. I am getting the key arrangement properly but unable to arrange the value at their specific places into new array.

$dataBaseName = "Jardine Lloyd Thompson";
$bankdata = "Thompson Thompson Jardine"; 

$replacedataBaseName = preg_replace("#[\s]+#", " ", $dataBaseName);
$replacebankData = preg_replace("#[\s]+#", " ", $bankdata); 

$arraydataBaseName = explode(" ",$replacedataBaseName);
$arraybankData = explode(" ",$replacebankData); 

echo "<br/>";
print_r($arraydataBaseName);

$a="";
$i="";
$arraysize =  count($arraydataBaseName);

$push=array();
for($i=0;$i< $arraysize;$i++)
{     
  if(array_search($arraybankData[$i],$arraydataBaseName)>0)
  {
    ${"$a$i"} =  array_search($arraybankData[$i],$arraydataBaseName); 
    //echo ${"$a$i"};
    array_push($push,${"$a$i"});
   }    
 }
 print_r($push); 

Case 1:

Input

DatabaseName = Jardine Lloyd Thompson

BankName = Thompson Jardine Lloyd

Output

ExpectedOutput = Jardine Lloyd Thompson

Case 2:##

Input

DatabaseName = Jardine Lloyd Thompson

BankName = Thoapson Jordine Llayd

If the words are not found in the above DatabaseName then the expected search would be based on leventish algorithm word which have less distance that would be considered as the key

Output

ExpectedOutput = Jordine Llayd Thoapson

Description of Problem

Question Update

When the user input $bankdata contains more words remaining unmatchable, I need to append those to the end.

Pinke Helga
  • 6,378
  • 2
  • 22
  • 42

2 Answers2

1

This is a simple version, finding the best match word by word subsequently.

declare (strict_types=1);

$dataBaseName = 'Jardine Lloyd Thompson';

$bankdataRows =
[
  'Thompson Jardine Lloyd',
  'Blaaa  Llayd Thoapson   f***ing user input   Jordine   aso. ',
];

// assume the "database" is already stored trimmed since it is server-side controlled
$dbWords = preg_split("#[\s]+#", $dataBaseName);

foreach ($bankdataRows as $bankdata)
{
  // here we trim the data received from client-side.
  $bankWords = preg_split("#[\s]+#", trim($bankdata));
  $result    = [];

  if(!empty($bankWords))
    foreach ($dbWords as $dbWord)
    {
      $idx   = null;
      $least = PHP_INT_MAX;

      foreach ($bankWords as $k => $bankWord)
        if (($lv = levenshtein($bankWord, $dbWord)) < $least)
        {
          $least = $lv;
          $idx   = $k;
        }

      $result[] = $bankWords[$idx];
      unset($bankWords[$idx]);
    }

  $result = array_merge($result, $bankWords);
  var_dump($result);
}

result

array(3) {
  [0] =>
  string(7) "Jardine"
  [1] =>
  string(5) "Lloyd"
  [2] =>
  string(8) "Thompson"
}

array(8) {
  [0] =>
  string(7) "Jordine"
  [1] =>
  string(5) "Llayd"
  [2] =>
  string(8) "Thoapson"
  [3] =>
  string(5) "Blaaa"
  [4] =>
  string(7) "f***ing"
  [5] =>
  string(4) "user"
  [6] =>
  string(5) "input"
  [7] =>
  string(4) "aso."
}

See live fiddle

You might want to extend this approach first calculating the Levenshtein distance of each possible combination and then select the best entire match.

Pinke Helga
  • 6,378
  • 2
  • 22
  • 42
  • Thank you for sharing I am too close with your solution but when I pass $dataBaseName = trim(' jardine marks llord thompson'); $bankdataRows =[$dataBaseName,trim('lloyd thodal jardine')]; the output which I am getting is correct but with error **Notice: Undefined offset: 0** and for this case $dataBaseName = trim(' jardine llord thompson'); $bankdataRows =[$dataBaseName,trim('lloyd thodal jardine spark')]; when second variable has 4 words if 1 word is unmatchable it should be appended at the end or at the empty space please suggest I have tried but facing some issue –  Jan 20 '19 at 06:27
  • 1
    @daoootim Just append the remaining bank words `$result = array_merge($result, $bankWords);` To preserve the order of the remnant I've converted the sorting into a `foreach` loop. – Pinke Helga Jan 20 '19 at 09:50
  • @daoootim If there are more issues not previously described in the question, please open a new question facing one specific issue in compliance to the SO policy. – Pinke Helga Jan 20 '19 at 09:57
  • @ Quasimodo's clone Legend Thanks you are champ –  Jan 20 '19 at 11:30
  • need your help when i try this names DatabaseName ='E SRINIVAS' and BankName ='SRINIVAS ETTAMALLA' ExpectedOutput = 'ETTAMALLA SRINIVAS' and i am getting this output = 'SRINIVAS ETTAMALLA' –  Jan 29 '19 at 12:49
0

I have broken up the code in case 1 and 2.
But obviously if the var_export is false you do the case 2 code with the same variables.

//Case 1:
$DatabaseName = "Jardine Lloyd Thompson";
$BankName = "Thompson Jardine Lloyd";

//Split and sort them
$data = explode(" ", $DatabaseName);
$bank = explode(" ", $BankName);
sort($data);
sort($bank);
Var_export(($data == $bank)); //true

//Case 2
$DatabaseName = "Jardine Lloyd Thompson";
$BankName = "Thoapson Jordine Llayd";

//Split and sort
$data = explode(" ", $DatabaseName);
$bank = explode(" ", $BankName);
sort($data);
sort($bank);

// Loop and accumulate the levenshtein return
$lev = 0;
foreach($data as $key => $name){
    $lev += levenshtein($name, $bank[$key]);
}

echo PHP_EOL . $lev; // 3 letters "off"

https://3v4l.org/eP5PE

Example of case 1 and 2 in the same code.

$DatabaseName = "Jardine Lloyd Thompson";
$BankName = "Thoapson Jordine Llayd";

$data = explode(" ", $DatabaseName);
$bank = explode(" ", $BankName);
sort($data);
sort($bank);
if($data == $bank){
    echo "true";
    exit;
    // No need to do levenshtein
}

$lev = 0;
foreach($data as $key => $name){
    $lev += levenshtein($name, $bank[$key]);
}

echo PHP_EOL . $lev;

https://3v4l.org/RJSiB

Andreas
  • 23,610
  • 6
  • 30
  • 62
  • @ Andreas thanks for you answer but need to rearrange the word after finding levenshtein **e.g: DatabaseName ='E SRINIVAS' and BankName ='SRINIVAS ETTAMALLA' ExpectedOutput = 'ETTAMALLA SRINIVAS'** –  Jan 31 '19 at 05:24