I have a requirement for a document management system to handle pdf,word,xls,ppt with semantic search.
I started looking into elasticsearch for the same and stumbled on Apache JacKrabbit
and subsequently on OpenKM
and Hippo
. Even though core features like versioning exists in Jackrabbit, I need some pointers on how to go about this.
I need help navigating through the following concerns:
- Should I just use
elasticsearch
andelasticsearch
attachment plugin or useJackrabbit
with MySQL backend and use Elasticsearch to index the documents. - Or should I use OpenKM?
Any pointers would be greatly appreciated. This would finally require App integration.
Update Logically, using ElasticSearch
for Search makes sense. But I figure that I cannot use that as primary datasource.
- What are the best options from storage(primary)
Apache JackRabbit
withMySQL
? - As all features are prebuilt in
OpenKM
, would this be a better option?.