2

I want to search whether the complete string or a part of the string is a part of the array. How can this be achieved in PHP?

Also, how can I use metaphone in it as well?

Example:

array1={'India','USA','China'};
array2={'India is in east','United States of America is USA','Made in China'}

If I search for array1 in array2, then:

'India' should match 'India is in east' and similarly for USA & China.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Viral Jain
  • 1,004
  • 1
  • 14
  • 30

3 Answers3

4
$array1 = array('India','USA','China');
$array2 = array('India is in east','United States of America is USA','Made in China');
$found = array();

foreach ($array1 as $key => $value) {
    // Thanks to @Andrea for this suggestion:
    $found[$value] = preg_grep("/$value/", $array2);
    // Alternative:
    //$found = $found + preg_grep("/$value/", $array2);
}

print_r($found);

Result:

Array
(
    [0] => India is in east
    [1] => United States of America is USA
    [2] => Made in China
)

Using Metaphone is trickier. You will have to determine what constitutes a match. One way to do that is to use the Levenshtein distance between the Methaphone results for the two values being compared.

Update: See @Andrea's solution for a more sensible per-word Metaphone comparison.

Here's a rough example:

$meta1 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array1
);

$meta2 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array2
);

$threshold = 3;

foreach ($meta2 as $key2 => $value2) {

    $k2 = key($value2);
    $v2 = $value2[$k2];

    foreach ($meta1 as $key1 => $value1) {

        $k1  = key($value1);
        $v1  = $value1[$k1];
        $lev = levenshtein($k2, $k1);

        if( strpos($v2, $v1) !== false || levenshtein($k2, $k1) <= $threshold ) {
            array_push( $found, $v2 );
        }
    }
}

...but it needs work. It produces duplicates if the threshold is too high. You may prefer to run the match in two passes. One to find simple matches, as in my first code example, and then another to match with Metaphone if the first returns no matches.

Community
  • 1
  • 1
Mike
  • 21,301
  • 2
  • 42
  • 65
  • Maybe it's better $found[$value] = preg_grep("/$value/", $array2); so all it's kept matching. – aercolino Jul 19 '11 at 11:40
  • @Andrea: That might work, but risks overwriting array keys if, say, USA is found in more than one of the searched elements. – Mike Jul 19 '11 at 11:46
  • I tested before posting, with an $array2 like you object, and it worked pretty well. – aercolino Jul 19 '11 at 12:00
  • @Andrea: Oh, I see - you'll get an array of arrays, indexed by the search value. Yes, that works well, and I've updated my answer. Thanks. – Mike Jul 19 '11 at 12:06
  • Would the anonymous down-voter mind explaining the reason for the down-vote? I'm here to learn too. – Mike Jul 19 '11 at 16:13
1

The metaphone case could also follow the same structure proposed by Mike for the strict case.

I do not think that an additional similarity function is needed, because the purpose of the metaphone should be to give us a key that is common to words that sound the same.

$array1 = array('India','USA','China');
$array2 = array(
    'Indiuh is in east',
    'United States of America is USA',
    'Gandhi was born in India',
    'Made in China'
);
$found = array();
foreach ($array1 as $key => $value) {
    $found[$value] = preg_grep('/\b'.$value.'\b/i', $array2);
}

var_export($found);

echo "\n\n";

function meta( $sentence )
{
    return implode(' ', array_map('metaphone', explode(' ', $sentence)));
}

$array2meta = array_map('meta', $array2);
foreach ($array1 as $key => $value) {
    $valuemeta = meta($value);
    $foundmeta[$value] = preg_grep('/\b'.$valuemeta.'\b/', $array2meta);
    $foundmeta[$value] = array_intersect_key($array2, $foundmeta[$value]);
}

var_export($foundmeta);

The above code prints out:

array (
  'India' => 
  array (
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

array (
  'India' => 
  array (
    0 => 'Indiuh is in east',
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)
aercolino
  • 2,193
  • 1
  • 22
  • 20
  • I like the idea of performing the metaphone comparison on a per-word basis. I wasn't sure what the OP wanted, but your solution makes sense. I had to introduce the Levenshtein function to handle the differences in Metaphone results between the words and the sentences. – Mike Jul 19 '11 at 16:11
0
$a1 = array('India','USA','China');
$a2 = array('India is in east','United States of America is USA','Made in China');


foreach ( $a2 as $a )
{
  foreach( $a1 as $b  )
  {
    if ( strpos( $a, $b ) > -1 )
    {
      echo $a . " contains " . $b . "\n";
    }
  }
}
Steve Claridge
  • 10,650
  • 8
  • 33
  • 35