0

I have a requirement for a document management system to handle pdf,word,xls,ppt with semantic search.

I started looking into elasticsearch for the same and stumbled on Apache JacKrabbit and subsequently on OpenKM and Hippo. Even though core features like versioning exists in Jackrabbit, I need some pointers on how to go about this. I need help navigating through the following concerns:

  • Should I just use elasticsearch and elasticsearch attachment plugin or use Jackrabbit with MySQL backend and use Elasticsearch to index the documents.
  • Or should I use OpenKM?

Any pointers would be greatly appreciated. This would finally require App integration.
Update Logically, using ElasticSearch for Search makes sense. But I figure that I cannot use that as primary datasource.

  • What are the best options from storage(primary) Apache JackRabbit with MySQL?
  • As all features are prebuilt in OpenKM, would this be a better option?.
Eyal.Dahari
  • 760
  • 6
  • 13
tx fun
  • 569
  • 5
  • 19

1 Answers1

1

What is it you want to achieve? Are you looking to manage making the documents available, is it about managing the content in documents? ES, or any search engine, is generally not a primary data source.

I can't give you any advice wrt OpenKM (neither for or against). Whether Hippo is a match depends on your case which I need to know more about.

Jasper Floor
  • 553
  • 3
  • 6