I have a gen_server on one system and 4 clients on 4 other systems. The code runs as expected for 3 or 4 days when the gen_server reports "** Removing (timedout) connection **". Because the clients can become active before of after the start of the gen_server, the clients execute this code prior to every call to the gen_server:
connect_IPDB() ->
% try every 5 sec to connect to the server
case net_kernel:connect_node(?SERVER) of
% When connected wait an additional 5 seconds for stablilty
true -> timer:sleep(5000);
false ->
timer:sleep(5000),
connect_IPDB()
end.
This works as anticipted, when bringing up the server or a client, in any order. They all connect and show up in nodes() when executed on the server.
Here is the problem. Sometime after the "** Removing (timedout) connection **" error, nodes() shows all of the nodes, implying that the client is not hung and has executed the above code. However communication with the timedout node has not resumed. How can I reestablish connection short of restarting the client? BTW, restarting the client does fix the issue.
Any help, appreciated.