I’m using Whoosh with Haystack. Haystack does not abstract the keyword extraction in Whoosh, so I’m using Whoosh directly for this feature.
@property
def keywords(self):
whoosh_backend = SearchForm().searchqueryset.query.backend
if not whoosh_backend.setup_complete:
whoosh_backend.setup()
with whoosh_backend.index.searcher() as searcher:
return dict(searcher.key_terms_from_text('text', self.text, normalize=False))
The result for this article:
{'attribut': 0.5051433470901747, 'donor': 0.5002458466718036, 'chariti': 1.0, 'factor': 0.5061597699646493, 'impact': 0.7091847877188343}
Unfortunately the keyword extraction in Whoosh appears to return stemmed keywords. Is there a way to get unstemmed versions of them, say, the most frequent or the shortest form? (The spelling corrector somehow manages to return unstemmed words.) Thanks!