3

We are using solr 8.9.0 with solr cloud mode with two shards and each shard has one replica(means 2 shards and 2 replicas) of type NRT.

We need to insert and update index frequently also require near real time data.

We have mainly two usages of searching data from solr:

  1. Index searching and retrieval of solr document for an application which requires real-time updated data
  2. Index searching and retrieval of solr document for API and Reporting(It works if we will get updated data late by 1-2 minutes)

So, we are planning to make replica of solr cloud for usage no:2(document for API and Reporting)

We came to know that the Cross Data Center Replication (CDCR) option is available in 8.9.0 but its deprecated and removed in 9.0. link: https://solr.apache.org/guide/8_9/cross-data-center-replication-cdcr.html

How can we sync data from one solr cloud to another solr cloud? for example: If one solr cloud updates documents then it should be also updated in other cloud.

Is there any configuration or component to look into for our use case?
Posted at user email list: https://lists.apache.org/thread/s330sopdzvv06qhvtk18y3gqz9yokbw0

Thanks in advance...

mcacorner
  • 1,304
  • 3
  • 22
  • 45

1 Answers1

2

It’s seem that the objective is to separate instances for operational (no:1) and analytics (no:2) usages

Before go further, since SolR no longer support CDCR and not provide a source Change Data Capture compatibility, replication between 2 cluster is more tricky

If the main goal is to increase the overall performance, keep in mind that SolR is very flexible and maybe you can upgrade horizontally your cluster by adding new shards

If you absolutely need a new instance for several reasons (security, maintenance, geographical replication), one possibility is to write simultaneously in your 2 target SolR instances

To maximise synchronization between them, it’s imply that :

  • you have full control of your write components
  • define an exceptions rules management (example : how to handle errors if you cannot write in one instance due to availability/network issue)
  • define a strategy to retrieve all historical data from existing to the new instance. Regarding volumetry, it can take several days

Hope it can help

Hakan
  • 126
  • 5
  • 1
    If we will add more shards, then searching will not slow? – vishal patel May 03 '23 at 05:53
  • Nope, on the contrary it can be a way to boost overall performance, of course if it correctly configured and benchmarked. A point of intention must be put on the strategy of request replication (sometimes it’s better to execute query locally on shard to avoid network saturation, aggregation of results, …) – Hakan May 03 '23 at 21:28