1

In this post, there is the approved comment with the following statement:

Cluster takes this to the next level by using a quorum agreement to prevent message loss in the case of node failure.

I'm testing the delivery in case of one cluster node failure but from my observation, the messages can get lost in case of node failure.

I'm using io.aeron.samples.cluster.tutorial.BasicAuctionClusterClient from aeron code base together with io.aeron.samples.cluster.tutorial.BasicAuctionClusterClient (version 1.38.1)

I did a small adjustment in BasicAuctionClusterClient to see whether the message was received or not:

    public void onSessionMessage(
        final ClientSession session,
        final long timestamp,
        final DirectBuffer buffer,
        final int offset,
        final int length,
        final Header header)
    {
        final long correlationId = buffer.getLong(offset + CORRELATION_ID_OFFSET);                   // <1>
        System.out.println("Received message with correlation ID " + correlationId); // this line is added
        // the rest is the same

    }

When I start the cluster with 3 nodes, 1 of them is elected as LEADER. Then I start the BasicAuctionClusterClient which starts to send requests to the cluster.

When I stop the leader, the new one is elected as expected but the messages from this point in time to a new leader election never reach the cluster (see the gap in correlation ID bellow).

New role is LEADER
Received message with correlation ID -8046281870845246166
attemptBid(this=Auction{bestPrice=144, currentWinningCustomerId=1}, price=152,customerId=1)
Received message with correlation ID -8046281870845246165
attemptBid(this=Auction{bestPrice=152, currentWinningCustomerId=1}, price=158,customerId=1)
Consensus Module
io.aeron.cluster.client.ClusterEvent: WARN - leader heartbeat timeout
Received message with correlation ID -8046281870845246154
attemptBid(this=Auction{bestPrice=158, currentWinningCustomerId=1}, price=167,customerId=1)

What is expected from the developer to do in case they want to have the delivery (processing) guaranteed? Is it expected to have custom made ack system with retries and duplicate requests handling on cluster node's side?

Frank
  • 859
  • 11
  • 24

3 Answers3

4

Aeron cluster provides certain guarantees, but they are slightly different guarantees to the ones you have in mind.

I'm testing the delivery in case of one cluster node failure but from my observation, the messages can get lost in case of node failure.

There is nothing unusual in losing last few messages that you published. There are many reason why it can happen. The process on the receiving side can die etc.

If I read the code of the io.aeron.cluster.client.AeronCluster#offer(org.agrona.DirectBuffer, int, int) correctly, it is a non blocking publication that does not wait for the message to be committed before returning control to the client. I use the word 'committed' as defined by the Raft protocol that Aeron Cluster implements. If you read the Raft paper, it says

Raft guarantees that committed entries are durable and will eventually be executed by all of the available state machines. A log entry is committed once the leader that created the entry has replicated it on a majority of the servers

If your messages were committed in a Raft sense before the previous leader died, your newly elected leader of a multi-node Aeron cluster will eventually process them in order.

Re your last question

What is expected from the developer to do in case they want to have the delivery (processing) guaranteed?

  • check if the offer result is not negative (e.g. io.aeron.Publication#NOT_CONNECTED) to detect issues earlier, but more importantly
  • use a higher level protocol with a sequence number/correlation Id that sends back ACKs from within your receiving io.aeron.cluster.service.ClusteredService implementation. It would guarantee that the message was committed in a Raft sense as it is a prerequisite to processing it by the Aeron Cluster state machine (onSessionMessage).
Michael Szymczak
  • 1,216
  • 13
  • 15
2

The point at which a client can guarantee that at message will survive a cluster failure is after that message has been acknowledged. Typically this is managed by having the application (i.e. the implementation of ClusteredService) send an acknowledgement message on the egress channel back to the client.

Michael Barker
  • 14,153
  • 4
  • 48
  • 55
-1

Suggest testing it again with 5 nodes as minimum. If a cluster only has 3 nodes, it is impossible to select a new leader if the leader is down. Because no quorum could be meet since 1 vs. 1

According to RAFT, the number of nodes (m) should be 4n + 1, n >= 1.

Richard
  • 2,080
  • 18
  • 17
  • I don't think the leader election is the issue - I can see in the logs that the leader was elected after a while. There are some missing deliveries though. I will update the description to note that the leader election happened correctly. – Frank Aug 23 '22 at 14:14
  • On top of that, I don't think there is any suggestion like having 4 nodes in raft cluster. The recommended node count is afaik either 3 or 5. – Frank Aug 23 '22 at 14:17
  • I believe that your statement about 3 nodes not being sufficient to elect a leader in case of one node failure is not true. As per RAFT paper: "Raft uses randomized election timeouts to ensure that split votes are rare". One of the two remaining nodes initiates a vote and another one simply accepts it, giving you 2 out of 3 majority and a new leader. – Michael Szymczak Oct 14 '22 at 18:22