0

My application indexes discussion threads. Each entry in the discussion is indexed as a separate Lucene document with a common_id field which can be used to group search hits into one discussion.

Currently when the search is performed, if a thread has 3 entries, then 3 separate hits are returned. Even though this is correct, from the users point of view the same entry is appearing in the search multiple times.

Is there a way to tell lucene to group it's search results by the common_id field before returning them?

Roman
  • 10,309
  • 17
  • 66
  • 101

3 Answers3

1

I believe what you are asking for is Field Collapsing, which is a feature of Solr (and I believe Elasticsearch as well).

If you want to roll your own, One possible way to do this is:

  1. Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
  2. Make an initial query to Lucene, and get a hit list.
  3. For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.

An alternative is to store the ids of all the series members in a field inside each member's document.

Kara
  • 6,115
  • 16
  • 50
  • 57
Yuval F
  • 20,565
  • 5
  • 44
  • 69
0

Since version 3.2 lucene supports grouping search results based on a field. http://lucene.apache.org/core/4_1_0/grouping/org/apache/lucene/search/grouping/package-summary.html

Yossi Vainshtein
  • 3,845
  • 4
  • 23
  • 39
0

There is nothing built into Lucene that collapses results based on a field. You will need to implement that yourself.

However, they've recently built this feature into Solr.

See http://www.lucidimagination.com/blog/2010/09/16/2446/

bajafresh4life
  • 12,491
  • 5
  • 37
  • 46