Hyperledger Fabric: Peer nodes fail to restart with byfn script when machine is shut down while network is running

Question

I have a hyperledger fabric network running on a single AWS instance using the default byfn script.

ERROR: Orderer, cli, CA docker containers show "Up" status. Peers show "Exited" status.

Error occurs when:

Byfn network is running, machine is rebooted (not in my control but because of some external reason).
Network is left running overnight without shutting the machine. Shows same status next morning.

Error shown:

docker ps -a

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b0523a7b1730 hyperledger/fabric-tools:latest "/bin/bash" 23 seconds ago Up 21 seconds cli bfab227eb4df hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 23 seconds ago peer1.org1.example.com 6fd7e818fab3 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 19 seconds ago peer1.org2.example.com 1287b6d93a23 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 22 seconds ago peer0.org2.example.com 2684fc905258 hyperledger/fabric-orderer:latest "orderer" 28 seconds ago Up 26 seconds 0.0.0.0:7050->7050/tcp orderer.example.com 93d33b51d352 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 25 seconds ago peer0.org1.example.com

Attaching docker log: https://hastebin.com/ahuyihubup.cs

Only the peers fail to start up. Steps I have tried to solve the issue:

docker start $(docker ps -aq) or manually, starting individual peers.
byfn down, generate and then up again. Shows the same result as above.
Rolled back to previous versions of fabric binaries. Same result on 1.1, 1.2 and 1.4. In older binaries, error is not repeated if network is left running overnight but repeats when machine is restarted.
Used older docker images such as 1.1 and 1.2.
Tried starting up only one peer, orderer and cli.
Changed network name and domain name.
Uninstalled docker, docker-compose and reinstalled.
Changed port numbers of all nodes.
Tried restarting without mounting any volumes.

The only thing that works is reformatting the AWS instance and reinstalling everything from scratch. Also, I am NOT using AWS blockchain template. Any help would be appreciated. I have been stuck on this issue for a month now.

Most probably some corrupt file or drivers. I would definitely investigate docker logs of every containers for anomalies? If that isn't satisfactory - I would check system logs. In addition to that, I would create a new replica of existing network with exact settings/configuration on different ec2 instance to check whether the issue persist across different servers. — Niraj Kumar, Mar 14 '19 at 08:31
@NirajKumar Yes the issue persists across different servers. I have attached the docker log in the question. Was that of any help? Which system logs are you referring to? — the_stack_over_flew, Mar 14 '19 at 08:46
I did see the log. I would recommend that - next time when you are building the replica on different server - keep extra memory handy. I am not sure about your EC2 instance configuration, so I am of the opinion that less than required memory might be the issue. And, I would update the configurations (docker compose file & the shell file) to take benefit of docker swarm which restarts a container in the event of failure. — Niraj Kumar, Mar 14 '19 at 09:11
@NirajKumar Details about Configuration: t3.medium instance, 4 GB/69 GB. On a different server, before and after creating the network, available memory changes from 3.3 GB to 2.9 GB — the_stack_over_flew, Mar 14 '19 at 10:05
@NirajKumar I worked with Docker Swarm for multiple instances and it showed the same error. — the_stack_over_flew, Mar 14 '19 at 10:07
Strange! Docker swarm should restart the peer nodes automatically. That's what it does. Well, I will post more if new information comes up! — Niraj Kumar, Mar 14 '19 at 10:12

score 1 · Accepted Answer · answered Apr 16 '19 at 05:58

1

Error resolved by adding following lines to peer-base.yaml:

GODEBUG=netdns=go
dns_search: .

Thanks to @gari-singh for the answer: https://stackoverflow.com/a/49649678/5248781

answered Apr 16 '19 at 05:58

the_stack_over_flew

95
8

Hyperledger Fabric: Peer nodes fail to restart with byfn script when machine is shut down while network is running

1 Answers1