We've been using Elasticsearch in the system. Although i used its analyzers and queries. I didn't do deep into its indexing. as of now, i don't know how far ES lets us work the Lucene (inverted-)indexes it has in its shards.
We're now looking at a range of NLP features-- NER for one thing and Stanford NLP appealed.
There's no plug-in to work these 2 packages together(?)
I haven't had a deep look into Stanford NLP. however - as far as i saw, it's working it all on its own indexes. whichever object or type passed to it, Stanford NLP is indexing it itself and going from there.
This would make the system work 2 different indexes for the same set of documents-- those of ES & StanfordNLP, and this would be costly.
Is there a way to get around this?
One scenario i have is: make StanfordNLP work on Lucene segments-- the inverted indexes that ES already built. In this case:
1.) does StanfordNLP use Lucene indexes without re-indexing anything for itself? i don't know StanfordNLP's indexing structure-- or even how far it uses/doesn't use Lucene.
2.) are there any restrictions on using the Lucene indexes in ES shards? would we hit a rock bottom in using these Lucene segments directly as is, bypassing ES in between?
I'm trying to put things together-- all in the air for now. sorry for the naive Q.
I'm aware of OpenNLP and its plug-in. i haven't checked - i'm guessing it wouldn't be "double-indexing" and using ES's indexes(?) However, it's StanfordNLP we're after.
TIA.