0

I'm in the process of creating a metasearch engine and I'm stuck! Using php I send a query to 3 search engines and pull the top 10 urls from each one. I then store these urls in a 2d array with a corresponding score for aggregation purposes ie. the 1st result gets 20pts, 2nd gets 18pts etc.

so in the following example I query the search engines with 'php' and get these results:

Blockquote

Blekko

Array ( [url] => php.about.com/ [score] => 20 ) Array ( [url] => php.net/ [score] => 18 ) Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 16 ) Array ( [url] => www.phpbuilder.com/ [score] => 14 ) Array ( [url] => blekko.com/ws/http://php.about.com/+/seo [score] => 12 ) Array ( [url] => www.w3schools.com/php/default.asp [score] => 10 ) Array ( [url] => phpnuke.org/ [score] => 8 ) Array ( [url] => www.symfony-project.org/ [score] => 6 ) Array ( [url] => www.phpconference.co.uk/ [score] => 4 )

Entireweb

Array ( [url] => phpnuke.org/ [score] => 20 ) Array ( [url] => www.aardvarktopsitesphp.com/ [score] => 18 ) Array ( [url] => www.php.net/ [score] => 16 ) Array ( [url] => www.php.net/downloads.php [score] => 14 ) Array ( [url] => php.net/manual [score] => 12 ) Array ( [url] => www.php.net/manual/en/ [score] => 10 ) Array ( [url] => www.php.net/docs.php [score] => 8 ) Array ( [url] => www.php.net/license/ [score] => 6 ) Array ( [url] => www.phplinkdirectory.com/ [score] => 4 )

Bing

Array ( [url] => www.php.net/ [score] => 20 ) Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 18 ) Array ( [url] => www.php.net/downloads.php [score] => 16 ) Array ( [url] => www.w3schools.com/php/default.asp [score] => 14 ) Array ( [url] => windows.php.net/download [score] => 12 ) Array ( [url] => windows.php.net/ [score] => 10 ) Array ( [url] => www.tizag.com/phpT/ [score] => 8 ) Array ( [url] => wiki.php.net/ [score] => 6 ) Array ( [url] => qa.php.net/ [score] => 4 ) Array ( [url] => www.php.com/ [score] => 2 )

What I'd like to do is combine all these results, remove duplicate urls but add the scores and create a new list with the aggregated results that might look something like:

Array ( [url] => www.php.net/ [score] => 54 )

Array ( [url] => en.wikipedia.org/wiki/PHP [score] => 34 )

Array ( [url] => www.w3schools.com/php/default.asp [score] =>24 )

etc.

I'm just looking for the most efficient way to achieve this, any advice would be very much appreciated. Thanks

shanahobo86
  • 467
  • 2
  • 7
  • 23

1 Answers1

0

1- You can trim urls after that you can understand that www.php.net and php.net are the same website (also www.php.net and php.net/downloads.php are the same).

2- Give more points for returning results from Bing. You know that Bing is most semantic search motor.

3- You can catch titles and save them to arrays, it is a personal recommandation.

ActuallyMAB
  • 224
  • 1
  • 10
  • Thanks for the advice. Bing certainly returns much more relevant results than entireweb. Is there an easy way to add an integer from one 2d array to that of another 2d array? I guess that would get me started on combining the results. – shanahobo86 Jun 28 '12 at 13:59