3

I am using Sphinx to provide search to a website and I've run across a bit of a snag when returning relevant results.

To keep my question simple, let's assume that I have two fields, @title and @body, which are weighted 100 & 15 respectively. When I search for small words like the word 'in' I would like to have it rank exact matches for that search term higher and then check for matches to 'in*|*in|*in*' and rank them slightly lower. Is there any way to have this type of specificity for your searches?

Example results for 'in':

  1. Indian Food
  2. In The Middle
  3. Document about Latin

Some relevant settings are:

In sphinx.conf:

morphology              = stem_en
charset_type            = utf-8
min_word_len            = 2
min_prefix_len          = 0
min_infix_len           = 2
enable_star             = 1

In search.php

$sp->SetMatchMode( SPH_MATCH_EXTENDED2 );
$sp->SetRankingMode( SPH_RANK_PROXIMITY_BM25 );
$sp->SetFieldWeights ( array('title' => 100, 'body' => 15) );

Also, as a side note: I've also had some instances where partial matches don't even show up in the search results. For example, I have searched for Cow but Cowboy does not show up as a result. I have also searched for Cowb and Cowbo and it wasn't until I typed Cowboy that I received the expected result. Any thoughts?


This question is along the same lines as this previous SO question, but I hope I've given a little more detail as to my problem and the things I've tried to warrant a solution.

Community
  • 1
  • 1
ServAce85
  • 1,602
  • 2
  • 23
  • 51

2 Answers2

3

Looks like morphologically Cow not related to Cowboy.

You could solve it in two ways:

  1. Use wordforms file with Cow > Cowboy
  2. As star is enabled you could change query from "Cow" to "Cow*" which will find all words starting with "Cow".

Regard different ranking for "in" and "in" I could suggest to have two body fields in index, lets say: body and body_star with the same content from body field.

in search.php

$sp->SetRankingMode( SPH_RANK_PROXIMITY_BM25 );
$sp->SetMatchingMode( SPH_MATCH_EXTENDED2 );
$sp->SetFieldWeights ( array('title' => 20, 'body' => 15, 'body_start' => 5) );
$sp->Query("@body in @body_star *in* @title in");

This should do the trick.

Iaroslav Vorozhko
  • 1,719
  • 14
  • 12
  • I love reading a solution that seems elegant. I'll have to play around with the 'cowboy' problem a bit, but I really like your suggestion as to how to solve the 'in' problem. Great suggestion! (that is... until someone comes along and tells me differently ;) I'll accept it as the correct answer when I test it assuming it works. – ServAce85 Aug 27 '11 at 19:48
2

Also you could set expand_keywords option in your config http://sphinxsearch.com/docs/1.10/conf-expand-keywords.html and set ranking mode to SPH_RANK_SPH04 http://sphinxsearch.com/blog/2010/08/17/how-sphinx-relevance-ranking-works/

Ris90
  • 841
  • 2
  • 13
  • 31