Is there any benchmarking about what is an acceptable latency between nodes for an Ignite cluster to function stably?
We are currently having a single cluster across AZ's (same region). The AZ latency by the cloud provider is ~0.4-1ms.
What we have observed is for boxes where the AZ latency is larger i.e. > 0.8, we start seeing server engine memory growing exponentially. We controlled that by setting the msg queue and slow client limits to 1024 & 1023 respectively. This helped get the memory in check.
However now we are seeing client nodes failing with "Client node outbound message queue size exceeded slowClientQueueLimit, the client will be dropped (consider changing 'slowClientQueueLimit' configuration property)".
This results in continuous disconnect and reconnect happening on these client nodes and subsequently no processing going through.
Is there any benchmarking done for Ignite or documents available which say, for a stable ignite cluster the latency between nodes cannot be > x ms?
However, if this is indeed our application issue then I would like to understand how to troubleshoot or get around this issue.
TIA