Apache Ignite : NODE_LEFT event

Question

I wanted to understand the how a Node left event is triggered for an Apache Ignite grid.

Do nodes keep pinging each other constantly to find it if nodes are present or they ping each other only when required?
If ping from client node is not successful then can it also trigger NODE_LEFT event or it can only be triggered by server node.
Once a node has left, then which node triggers topology update event i.e. PME. Can it be triggered by client node or only server nodes can trigger it.

Alexandr Shapkin · Answer 1 · 2023-02-03T11:04:15.220

1

Yes, nodes are pinging each other to verify the connection. Here is more detailed explanation of how a node failure happens. You might also check this video.

The final decision of failing a node (leaving the cluster) is made on the Coordinator node issuing a special event that has to be acked by other nodes (NODE FAILED).

Though a node might leave a cluster explicitly, sending a TcpDiscoveryNodeLeftMessage (aka triggering a NODE_LEFT event), for example when you stop it gracefully.

Only the coordinator node can change topology version, meaning that a PME always starts on the coordinator and is spread to other nodes afterward.

edited Feb 03 '23 at 11:04

answered Feb 03 '23 at 10:56

Alexandr Shapkin

2,350
1
6
10

Thank You for details. If there is a situation where a node is processing jobs but becomes unreachable and Node failed event is triggered then will its assigned jobs be reallocated to reachable nodes? And is it possible that if Node becomes responsive again say after Node left event is triggered then it can rejoin cluster? – Lokesh Feb 03 '23 at 15:53
Yes, it's configurable using the failoverSPI https://www.gridgain.com/docs/latest/developers-guide/distributed-computing/fault-tolerance – Alexandr Shapkin Feb 03 '23 at 16:04
A node can leave and re-enter the cluster as many times as required. but you might need to start it explicitly. In case of a persistent data node and untouched baseline it won't even trigger a PME, at least it should not. – Alexandr Shapkin Feb 03 '23 at 16:18
From documentation, it seems JobStealingFailoverSpi will use config params in JobStealingCollisionSpi to steal jobs from queue of other nodes. However when a node has crashed then will job stealing still work for tasks of crashed node? – Lokesh Feb 03 '23 at 17:46
Job stealing is for different needs. If a node with an active compute task fails, the task should be re-routed to another node by default. – Alexandr Shapkin Feb 03 '23 at 19:00
We have observed that when a node fails, a lot of tasks on that node error out with Interrupted Exception, so instead of going to another node task fails. Due to this, we had to create a custom retry logic in broker. So effectively task is retried due to custom logic rather than grids own failover. is there a better way to handle this scenario. – Lokesh Feb 04 '23 at 05:18
I suppose we might be talking about different things. A client failure (or other compute initiator) should trigger a compute interruption, because there is no more compute coordinator available and no place to return the result. But a data node failure should be handled with a failure SPI, IMO. – Alexandr Shapkin Feb 04 '23 at 11:15
Anyway it seems to be a different discussion. – Alexandr Shapkin Feb 04 '23 at 11:32
During a graceful shutdown, i believe the flow should be: Terminated node sends "Node_left" event -> All other nodes (server+thick client) receive the event and locally remove node from topology -> Server nodes stop sending msgs to terminated node -> Once all nodes have acknowledged "Node Left" event, coordinator node triggers PME and topology version gets updated. Is this correct understanding? – Lokesh Feb 19 '23 at 07:51
1

Yes, your understanding is correct. NODE_LEFT is triggered by a node that is stopping -> Coordinator receives it -> sends across the ring -> waits for ACK and changes topology version. – Alexandr Shapkin Feb 20 '23 at 13:23
I have query regarding the ring here. There are 2 rings one is topology ring (server+thick client) and only server ring. For taking decisions for Node left and Node failed, which ring is used, server ring or topology ring? My key concern here is that if thick client also takes part in decision making (i.e. by sending ACK to coordinator) then scaling grid will run into issues as client which is supposed to send ACK to coordinator may go down before sending back ACK msg. – Lokesh Mar 05 '23 at 06:41

Apache Ignite : NODE_LEFT event

1 Answers1