3

I would like to know if it would be recommended to use Lucene as data storage. I am saying 'recommended' because I already know that it's possible.

I am asking this question because the only Q&A I could find on SO was this one: Lucene as data store which is kind of outdated (from 2010) even if it is almost exactly the same question.

My main concern about having data exclusively in Lucene is the storage reliability. I have been using Lucene since 2011 and at that time (version 2.4) it was not improbable to encounter a CorruptIndexException, basically meaning that the data would be lost if you didn't have it somewhere else. However, in the newest versions (from 4.x onward), I've never experienced any problem with Lucene indices.

The answer should not consider the performance too much as I already have a pretty good idea of what to expect in that field.

I am also open to hear about SOLR and ElasticSearch reliability experiences... (how often are the shards failing, what options do we have when this occurs, etc)

Community
  • 1
  • 1
aymeric
  • 3,877
  • 2
  • 28
  • 42
  • I'm not really prepared answer your question. My immediate concern, however, would be this: "the data would be lost if you didn't have it somewhere else". Any data you don't want to lose should always be recoverable from somewhere else. I wouldn't trust mySql or Oracle with a data store I can't recover, either. – femtoRgon Mar 27 '16 at 21:47
  • That's a good point. I'm only considering reliability within the scope of normal operations. Of course regular backups can be done and should be done, but as you say, the same applies for any data storage so it's not most relevant aspect here. – aymeric Mar 27 '16 at 22:25

2 Answers2

0

This sounds like a good good match for Solrcloud as it is able and willing to handle the load and also takes care of the backup. My only concern would be that it is not a datastore, it "only" works with the indexing of those documents.

Lefty G Balogh
  • 1,771
  • 3
  • 26
  • 40
0

We are using SolrCloud for data storage and reliability is pretty good till now. However make sure that you configure and tune it well or else you could find nodes failing and zookeeper being unable to detect some of them after some time..

Koustav Ray
  • 1,112
  • 13
  • 26