0

I have an Ignite cluster of 3 baseline nodes and multiple client nodes. Each baseline node has a configuration with specified implementation of the IndexingSpi.

The IndexingSpi JavaDoc does not explain, how are the SPI methods called on the nodes in the cluster.

Can anyone explain in details?

What I have observed, is that when a client node, uses a cache (which has 2 partition backup copies), and puts an element in it, the IndexingSpi.store method is called on all the 3 baseline nodes.

When a client nodes uses an SpiQuery to query the cache, the IndexinSpi.query method is called on all the 3 server nodes, and the client receives iterator over the UNION of the the result from the 3 nodes. (E.g. 3x times the result).

In my implementation of the IndexingSpi, I use Elasticsearch as indexing service.

The way IndexingSpi works, is causing one cache entity to be indexed 3 times, by every server node, and when a query is executed, the result has each element 3 times.

I also tried using a cluster wide semaphore in order to limit the IndexingSpi service methods to be executed only one baseline node on the cluster. However, in that scenario, the query method on the nodes that do not hold the permit, returns empty iterator, and sometimes the client nodes making the query are getting empty result.

Overall my experience with the IndexingSpi has been very poor. I will very much appreciate some help to figure out how this SPI is working in a clustered environment.

Thank you in advance. Assen.

1 Answers1

0

A cache configured with two backups, means you have three copies of the data in your cluster. With three nodes, every node is going to have a copy.

That is what's happening with your IndexingSPI implementation. When you add a record, each node is saving each record. Maybe you could only "save" to the index on the primary partition? But would you be able to recover from node failures?

For a query, the expectation is that each node returns primary records that match the search critera.

Stephen Darlington
  • 51,577
  • 12
  • 107
  • 152
  • Thank you for your answer. Let me elaborate further - The implementation of the IndexingSpi uses and Elasticsearch cluster to index the cache's entities. The Elasticearch cluster is independent of the Ignite topology, and data stored (indexed) in it is persistent and survives loss of Ignite node (or partition). If I follow your suggestion to store (index) only the entity on the primary partition node - I'd need to figure out inside the IndexingSpi.store() if the object to be indexed is on a primary partition on the given node. Any idea how to check that? – Assen Sharlandjiev Feb 23 '22 at 19:49
  • `ignite.affinity("CACHE_NAME")` has a number of methods that can get you information about primaries/backups. – Stephen Darlington Feb 24 '22 at 09:18