4

I've read this and I'm still a bit confused on how to exactly go about it.

I have an unindexed field that is counting the number of votes for a set of playlists that are being searched. The main search works fine, but I also want to include the voting field as part of the algorithm and I'm not sure how to include the non-indexed field as part of it. Can anyone offer any guidance or an example?

1 Answers1

1

You do not have to necessarily adapt the scoring algorithm (which implements tf-idf btw).

If you just want to integrate the number of views into the scoring calculation, you can "boost" the search document before adding it to the index, e.g.:

$doc = new Zend_Search_Lucene_Document();
$boostFactor = 0.1;
$doc->boost = (float)$numberOfVotes * $boostFactor;
// ..
$index->addDocument($doc);
$index->commit();

The boost factor in this example is not really relevant, since you only have one boosting criteria. If you want to boost non-linear, you could also use exp or sqrt on $numberOfVotes.

But another question:

Why not use ElasticSearch (or another performant search engine) in the first place?

ElasticSearch e.g. is way more powerful and faster than the PHP implemenation of Zend Lucene. Plus it is really easy to hook into the scoring mechanism, e.g. http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query.html You can use a PHP client like Elastica along with it.

sebwebdev
  • 71
  • 4
  • Thank you. This worked nicely. Would you mind explaining which cases would be better suited for a linear boost vs a exp or sqrt boost? I would like to learn more about the pros and cons of boosting exp or sqrt. I get the general idea of what a exp or sqrt boost would be doing, I'm just looking for practical scenarios/examples where you would use one over the other so that I can better gauge which I should be using. I'm thinking that in this case I should be using sqrt so that the votes do not overpower the search terms when dealing with a large number of votes. – user1935733 Jan 02 '13 at 21:04
  • Also, thanks for the link to ElasticSearch. I may look into switching search engines if I come across performance issues with zend lucene. I'm mainly using it because it just happened to be one of the first search engines I came across that seemed to fit my needs and it had a nice setup example to follow – user1935733 Jan 02 '13 at 21:08