0

I was trying to migrate my Hyperledger Fabric network (running a RAFT ordering service) from one host to another.

In this process, I was making sure that the TLS communication is respected, which means that I made required changes in the system channel before migration process. I used the backup and genesis block (of old ordering service) to restore the network on target host. One new thing that I found was that when the orderer nodes started at new host, it took 10 minutes for them to sync blocks and start the RAFT election.

The question is: Is this default time configured in the orderer code-base or is it some other functionality?

NOTE: I know the that addition of an existing orderer node in some application channel takes 5 minutes by default for that orderer to detect the change. So, is the above situation something similar to this or is a different capability?

The complete orderer node (one that was started first on new host) logs can be found here.

Chintan Rajvir
  • 689
  • 6
  • 20
  • can you attach logs? – yacovm May 06 '20 at 16:11
  • @yacovm I have added the required logs. The node starts at 14:25:44 (UTC) and the block replication starts after 14:35:44 (UTC). Post the block replication, at 14:36:09 (UTC) we get output for our leader in system channel. At 14:36:12 we get the RAFT leader for our application channel. Within the initial 10 minutes, I cannot fetch the channel blocks on behalf of the orderers. It would throw "SERVICE_UNAVAILABLE". – Chintan Rajvir May 07 '20 at 07:29

1 Answers1

1

Eviction suspicion is a mechanism which triggers after a default timeout of 10 minutes.

yacovm
  • 5,120
  • 1
  • 11
  • 21
  • Cool! Understood. But when I migrate, I switch O3 TLS certs, then O2 certs in the channels at source. Then I start O2 and O3 at target (new TLS hosts). Ideally, as O2 would be ahead in blocks and also have info of new endpoint of O3, it should be able to talk to O3 right away. Eviction suspicion would happen if none of the 2 orderers know anything about the other. Please correct me if this is not the right understanding! – Chintan Rajvir May 07 '20 at 10:51
  • "A node suspects its channel eviction when it doesn’t know about any elected leader nor can be elected as leader in the channel." As per this, if we see in eyes of O3, it does not know new O2 endpoint, but O2 would know new O3 endpoint. So, by communication from O2, would not the blocks be replicated right away? If not, how come after 10 minutes O3 starts realizing that it can reach new O2 endpoint? – Chintan Rajvir May 07 '20 at 10:55
  • Because it doesn't probe immediately, it only probes after 10 min – yacovm May 07 '20 at 13:06
  • What I understand is this way, O2 tries to start election as soon as it starts (similarly O3 as well). But O3 does not know about O2 endpoint (assuming it will try to send vote message to the endpoint as seen in channel AND not directly in response to msgs recvd from O2). This leads to eviction suspicion after 10 min. How come after 10 min, O3 is able to communicate to the new endpoint of O2 (though it has not recvd the latest block having this new O2 endpoint)? Does it now try to communicate to O2 at a URI from which it is already recvng vote msgs (instead of looing in the channel)? – Chintan Rajvir May 07 '20 at 13:25
  • I found cause of this issue. Eviction suspicion occurs because O1 was the leader in the RAFT cluster at source host. At the time of switching to target, config updates demanded O2 and O3 to be started. This led to eviction suspicion after 10 min, which further led O2 and O3 to communicate, making O2 win the election because of a higher term. – Chintan Rajvir May 08 '20 at 11:04