1

In all the literature I've seen, the scalability of SolrCloud appears to concern querying only. Meaning, replication and sharding distributes the load of client queries accross greater CPU and wider bandwidth.

But what about Indexing?

Does SolrCloud's scalability improving index performance? Can it be configured to speed up index time? In my case, we need to commit new content to the index frequently; does that special case change anything.

Mark Miller's presentation from Lucene Revolution 2012 is fascinating and covers some details of indexing. But it seems that certain cloud features (like replication) could conceivably make indexing slower, not faster. Anyone tried SolrCloud?

ted.strauss
  • 4,119
  • 4
  • 34
  • 57
  • i have been "trying" solrcloud for some time but honestly i cant say i am satisfied with it. there are some weird behaviours of it, you can find one i came up with here: http://stackoverflow.com/questions/13485885/solrcloud-is-detecting-non-existing-nodes . for speeding up indexing you can rather play with the configs. I dont think solrcloud will have effects to slow down the indexing, but it is usually with configs. – denizdurmus Nov 23 '12 at 00:26

2 Answers2

0

Well, I am finally able to set up a proper cloud environment for testing and briefly, indexing speed is doomed even with RAMDirectory. I dont know if the indexing speed could be related the number of followers in cloud or number of collections, but having 1 leader 2 follower structure with 8 collections makes indexing 4 to 5 times slower. I am able to index around 3.5M docs in 17 minutes while with the same configs for each instance in the cloud, i can only index 650K docs in 17 minutes... I am not sure how to speed up SolrCloud indexing speed and some kinda surprised see that my expectations about cloud is destroyed one by one as I keep getting new bugs and problems while working on it.

If this is happening on any other settings too, I dont understand what is the point of using cloud for Solr. I mean if indexing speed is rising this much, i can reindex everything on a classical standalone solr instance much faster.

Seeing some other experiences with SolrCloud would be really nice, if anyone tried it or anyone has it on a real environment

denizdurmus
  • 1,289
  • 1
  • 13
  • 39
  • how many nodes do you have in your cloud? would you mind to share hardware config? I am struggling with increasing ingestion speed :( – Rahul Sharma Jan 22 '16 at 21:34
  • @RahulSharma it has been a long time that I have stopped testing for the above condition so dont remember the configs and hw in details :/ If you create a question with details, then maybe I may help or at least the other people here will definitely help... – denizdurmus Jan 24 '16 at 10:35
  • thanks @Stephan, I asked question regarding same but I haven't got any answer so far - http://stackoverflow.com/questions/34936008/solrcloud-does-it-matter-if-i-have-even-or-odd-number-of-shards – Rahul Sharma Jan 25 '16 at 16:15
0

Which version of solr you are using for solr cloud? Solr cloud is very stable since solr 4.8 release.

  1. You can increase the indexing speed by not hard committing documents frequently instead commit in batches i.e. after 45 or 60s. This can be achieved by the auto commit configuration in solr config -

  2. While hard commit ensures that that data is flushed to stable storage however it does not makes the changes visible which is achieved by soft commit tag. Set a value of soft commit to be around 90-120s. This alos can be achieved by a soft commit configuration in solr config -

Vijay Tiwary
  • 151
  • 10