0

I am trying to write a data-driven spell correction component, which uses the edit distance algorithm to get an initial list of suggestions (alternatives) corresponding to each token. I can think of two ways to do this: 1. Export the inverted index that Solr (6.5) creates from the feed, which I can use later in python to get a list of initial suggestions (alternative) for my spell correction. 2. I can connect to Solr from my python program to get a list of alternative/suggestions corresponding to each token.

Now the question is, how can I do this?

Saurav Malani
  • 181
  • 1
  • 4
  • What's the reason to not query Solr? Why would recreating the functionality in Python be a plausible option? It's better to start with the _why_ instead of the _how_. – MatsLindh Jul 10 '19 at 08:43
  • Actually, the algorithm is still in the development stage and at this stage, I don't want to have unnecessary complications because of Solr and I personally think python is easy to work with at this stage and then maybe when I have the whole spell correction model tested and ready, then I will implement/integrate it in Solr. – Saurav Malani Jul 10 '19 at 09:34
  • In that case you can [use PyLucene](http://lucene.apache.org/pylucene/) or for a bit more high level interface, [Lupyne](https://pypi.org/project/lupyne/). You should be able to point it directly to your Solr's index directory to read the tokens themselves. – MatsLindh Jul 10 '19 at 10:41

0 Answers0