I'm in the process of creating a metasearch engine and I'm stuck! Using php I send a query to 3 search engines and pull the top 10 urls from each one. I then store these urls in a 2d array with a corresponding score for aggregation purposes ie. the 1st result gets 20pts, 2nd gets 18pts etc.
so in the following example I query the search engines with 'php' and get these results:
Blockquote
Blekko
Array ( [url] => php.about.com/ [score] => 20 ) Array ( [url] => php.net/ [score] => 18 ) Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 16 ) Array ( [url] => www.phpbuilder.com/ [score] => 14 ) Array ( [url] => blekko.com/ws/http://php.about.com/+/seo [score] => 12 ) Array ( [url] => www.w3schools.com/php/default.asp [score] => 10 ) Array ( [url] => phpnuke.org/ [score] => 8 ) Array ( [url] => www.symfony-project.org/ [score] => 6 ) Array ( [url] => www.phpconference.co.uk/ [score] => 4 )
Entireweb
Array ( [url] => phpnuke.org/ [score] => 20 ) Array ( [url] => www.aardvarktopsitesphp.com/ [score] => 18 ) Array ( [url] => www.php.net/ [score] => 16 ) Array ( [url] => www.php.net/downloads.php [score] => 14 ) Array ( [url] => php.net/manual [score] => 12 ) Array ( [url] => www.php.net/manual/en/ [score] => 10 ) Array ( [url] => www.php.net/docs.php [score] => 8 ) Array ( [url] => www.php.net/license/ [score] => 6 ) Array ( [url] => www.phplinkdirectory.com/ [score] => 4 )
Bing
Array ( [url] => www.php.net/ [score] => 20 ) Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 18 ) Array ( [url] => www.php.net/downloads.php [score] => 16 ) Array ( [url] => www.w3schools.com/php/default.asp [score] => 14 ) Array ( [url] => windows.php.net/download [score] => 12 ) Array ( [url] => windows.php.net/ [score] => 10 ) Array ( [url] => www.tizag.com/phpT/ [score] => 8 ) Array ( [url] => wiki.php.net/ [score] => 6 ) Array ( [url] => qa.php.net/ [score] => 4 ) Array ( [url] => www.php.com/ [score] => 2 )
What I'd like to do is combine all these results, remove duplicate urls but add the scores and create a new list with the aggregated results that might look something like:
Array ( [url] => www.php.net/ [score] => 54 )
Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 34 )
Array ( [url] => www.w3schools.com/php/default.asp [score] =>24 )
etc.
I'm just looking for the most efficient way to achieve this, any advice would be very much appreciated. Thanks