0

I was wondering which scenario (or the combination) would be better for my application. From the aspect of performance, scalability and high availability.

Here is my application:

Suppose I am going to have more than 10m documents and it grows every day. (probably in 1 years it reaches to more than 100m docs. I want to use Solr as tool for indexing these documents but the problem is I have some data fields that could change frequently. (not too much but it could change)

Scenarios:

1- Using SolrCloud as database for all data. (even the one that could be changed)

2- Using SolrCloud as database for static data and using RDBMS (such as oracle) for storing dynamic fields.

3- Using The integration of SolrCloud and Hadoop (HDFS+MapReduce) for all data.

Best regards.

Ali
  • 1,759
  • 2
  • 32
  • 69

1 Answers1

0

I'm not sure how SolrCloud works with DIH (you might face situation when indexing will happen only on one instance).

On the other hand I would store data in RDBMS, because from time to time you will need to reindex Solr to add some new functionality to the index.

At the end of the day I would use DB + Solr (all the fields) with either Hadoop (have not used it yet) or some other piece of software to post data into the SolrCloud.

Fuxi
  • 5,298
  • 3
  • 25
  • 35
  • What if I need to search on dynamic fields that I already stored in RDBMS? This solution would work if I do not need any indexing over dynamic fields. Am I right? – Ali May 26 '14 at 14:13
  • I would advise to index those dynamic fields too. For a while I was working in the system where some of fields where in DB only and that was a bit pain in the a*. There are some approaches to make those fields work for you in Solr :) I would need to know a bit more to advise on that subject. – Fuxi May 26 '14 at 15:21