0

Possible Duplicate:
PHP MySQL Search And Order By Relevancy

Hi,

I have a table with several columns containing fields like name, address, company etc. Lets say someone search for "microsoft john". I want the results containing "microsoft" should appear first, then results containing john. vice versa if query is "john microsoft"

My php code is:

$searchitems=explode(" ", $trimmed);
//print_r($searchitems);
$so = $_GET['so']=='2'?"2":"1";

$clause = $so=='2'?"AND":"OR";
include("dbconnect.php");
// Build SQL Query

$query = "select FirstName,LastName,course,Department,batch,City,companyjob,companylocation,
    companyposition,coursename,institutename,coursename2,institutename2,coursename3,
    institutename3 from alumni WHERE ";
for($i=0;$i<count($searchitems);$i++)
{
    $queryappend .= "(FirstName LIKE '".$searchitems[$i]."%' OR LastName LIKE '".$searchitems[$i]."%'
    OR City LIKE '".$searchitems[$i]."%' OR CountryorRegion LIKE '".$searchitems[$i]."%'
    OR companyjob LIKE '".$searchitems[$i]."%' OR companylocation LIKE '".$searchitems[$i]."%'
    OR coursename LIKE '".$searchitems[$i]."%' OR institutename LIKE '".$searchitems[$i]."%'
    OR coursename2 LIKE '".$searchitems[$i]."%' OR institutename2 LIKE '".$searchitems[$i]."%')";
    if($i<count($searchitems)-1) $queryappend .= $clause;

}
$query .=$queryappend;

The problem is MYSQL is ordering the results by id... This makes it funny, because some higher valued results may be stuck deep in the stack. btw, phpmyadmin search has the same flaw.

Please suggest.

Community
  • 1
  • 1
hiprakhar
  • 1,007
  • 3
  • 12
  • 15
  • 2
    Not sure it's a flaw in either PHP or in phpmyadmin. I don't see __any__ ORDER BY clause in your statement at all, so the returned order will always be purely arbitrary (not even guaranteed to be by id) – Mark Baker Feb 10 '11 at 10:46
  • Thanks Mark for the reply. I need to ORDER BY the sequence of words in the search query. As I said in my example, if the first $searchitems is microsoft, all results containing microsoft should be displayed first. (microsoft results may further be ordered by the next $searchitems and so on). I cannot add any arbitrary order by as that would spoil the purpose. – hiprakhar Feb 10 '11 at 10:53
  • The page you referred http://stackoverflow.com/questions/344199/php-mysql-search-and-order-by-relevancy is simply asking to search individual words from the query string. I achieved that effect by using explode() function – hiprakhar Feb 10 '11 at 10:57
  • 2
    wow, that's what I call a poor man's full text search ... well, you'll certainly be poor after paying the monthly bill of your dedicated search server farm :) – sfussenegger Feb 10 '11 at 11:24
  • @sfussenegger You mean I should jump to fulltext search and this complicated stuff? Will that be less costly in terms of time and space complexity? – hiprakhar Feb 10 '11 at 11:30
  • @hiprakhar yes, you almost certainly should – sfussenegger Feb 10 '11 at 11:31

2 Answers2

2

As an example:

SELECT
  FirstName,
  LastName,
  IF (FirstName LIKE '%Microsoft%' || LastName LIKE '%Microsoft%', 1, 0) AS One,
  IF (FirstName LIKE '%John%' || LastName LIKE '%John%', 1, 0) AS Two
FROM alumni
ORDER BY One DESC, Two DESC

In your code, this will make the query pretty complicated. The advantage is, that items with both search term appear before items that match only a single search term.

An alternative is sorting the records into buckets while retrieving them using PHP. Assuming you have the search terms in an array $search (ordered by descending priority):

while ($record = mysql_fetch_array($result))
{
  $total = join(' ', $record);
  $found = false;
  foreach ($search as $term)
  {
    if (strpos($total, $term) !== false)
    {
      $buckets[$term][] = $record;
      $found = true;
      break;
    }
  }
  if (!$found)
  {
    $results[] = $record;
  }
}
foreach (array_reverse($search) as $term)
{
  if (isset($buckets[$term]))
  {
    $result = array_merge($buckets[$term], $result);
  }
}

Now you have the results in array $results. Note that this demonstrates the algorithm, it it not tuned for performance.

Oswald
  • 31,254
  • 3
  • 43
  • 68
  • Thanks @oswald!! That worked! thanks a lot for putting your precious time to write the code... But probably I will also pay heed to @sfussenegger and will shift to fulltext search. Will fulltext search be more efficient than your first and second (bucket) method ? – hiprakhar Feb 10 '11 at 11:39
  • Probably. Otherwise fulltext search would not be included in mysql. – Oswald Feb 10 '11 at 11:43
  • I have finally changed my search system. now the query after WHERE is MATCH (FirstName,LastName,Department,City,CountryorRegion,companyjob,companylocation, coursename,institutename,coursename2,institutename2,altemail,addcomments,Name,WebPage) AGAINST ('$trimmed' WITH QUERY EXPANSION) I am still experimenting more with the score part of fulltext to make it more robost. Thanks to everyone who helped me on this! Special thanks to @symcbean giving the nice idea of levenshtein... It was a nice read... This exercise will help me with my interviews after my undergrad degree! – hiprakhar Feb 12 '11 at 11:39
1

I would think the simplest way to solve it would be sorting the results by the levenstein distance.

Something like....

$queryappend="ORDER BY
   length(firstname) - levenshtein(FirstName, '".$searchitems[$i]."') +
   length(lastname)  - levenstein(LastName, '".$searchitems[$i]."')   +
   length(City)      - levenstein(City, '".$searchitems[$i]."')       +
   ...

Although it might be a good idea to use a schema MORE SUITED to this kind of searching

symcbean
  • 47,736
  • 6
  • 59
  • 94