1

I have got python code running on multiple analyzer machines each picking documents from solr(select operations) and modifying data in solr by re-submitting the documents with the updated fields from DB(in case of update/Insert). But since different solr instances on different machines have their own updated documents,this is leading to data inconsistency across the machines.

Is there any way i can keep a central solr document repository which will be queried and updated by different machines,thereby ensuring data consistency?

Kratos85
  • 183
  • 2
  • 5

2 Answers2

3

Solr forums would provide multiple threads on Concurrent Solr add/updates which would give you a clear picture.

You can maintain a single instance of Solr and have multiple clients commit into it.
Solr is not transactional like an RDBMS, but it does handle concurrency.
Whenever is commit is made a a lock is maintained so that others can't commit and are queued.
A commit can commit all pending commits as well.

Jayendra
  • 52,349
  • 4
  • 80
  • 90
0

You are doing this the wrong way.

SOLR is perfectly capable of running with a single master server which gets all updates, and many replica servers which serve all search queries. That way all servers are identical as long as you don't have too many replicas or the network bandwidth is not constrained for any of the replicas.

You would still have your update processes but they would only update the core(s) on the master server. The replica servers will get their updates automatically via SOLR's replica capability.

Start off by reading the SOLR wiki page on replication.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106