1

I'm using lunr to perform a search and I currently am highlighting the search from the value of the search text area, but lunr uses a stemmer and returns results that don't specifically match the full search term. is there a way to access the stem of the search term that lunr ends up searching on?

// query our lunr index
searchResults = _.map(index.search($('#searchInput').val()), function (res) {
    var uid = res.ref;
    return mediaList[uid];
});
Justin Lee
  • 909
  • 4
  • 9
  • 19

1 Answers1

2

The default stemmer that lunr uses is available as a function at lunr.stemmer

You can call it yourself with whatever token you want to stem, e.g.

lunr.stemmer("stemming") //= "stem"

However I don't think this will help you to achieve what you want, since the tokens in the documents you are searching have also been stemmed and this stemming is a one way operation. For example you won't know what other words would also have been stemmed to "stem" in the example above and could therefore miss some terms to highlight.

A work around might be to keep your own reverse stem lookup, so that later you can more easily match the resulting search terms in the output. This can be achieved by inserting a custom pipeline function in your index:

// going to store a hash of stemmed word to list of original words
var reverseStem = {}

var reverseStemIndexBuilder = function (token) {
  var stemmed = lunr.stemmer(token)

  if (stemmed in reverseStem) {
    reverseStem[stemmed].push(token)
  } else {
    reverseStem[stemmed] = [token]
  }

  return stemmed
}

// idx is your instance of a lunr index    
// we can remove the existing stemmer since reverseStemIndexBuilder already returns a stemmed token
idx.pipeline.remove(lunr.stemmer)
idx.pipeline.add(reverseStemIndexBuilder)

You can now look up all the tokens that the stem could have come from and then find them in your results and highlight them accordingly.

Oliver Nightingale
  • 1,805
  • 1
  • 17
  • 22