I am running Apache Storm with Nimbus set up on one node and two supervisors set up on two other nodes. After I launch the topology (with workers=2), it does not run correctly. In the worker logs of one of the supervisor nodes, I see the following errors:
2018-07-04 17:36:02.650 o.a.s.m.n.Client client-boss-1 [ERROR] connection attempt 1 to Netty-Client-hostname/X.X.X.X:6700 failed: org.apache.storm.shade.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: hostname/X.X.X.X:6700
On the other worker node, I see the following error:
2018-07-04 17:34:11.344 o.a.s.m.n.Client client-boss-1 [ERROR] connection attempt 3 to Netty-Client-hostname1/X.X.X.X:6700 failed: java.net.ConnectException: Connection refused: hostname1/X.X.X.X:6700
There are no other errors in the worker logs. If I replace one of these workers with another worker (on the same sub-net), the topology runs perfectly. Because of this, the issue seems to be related to connection between the two worker nodes. However, /etc/hosts file is set up correctly (same as of the worker that when paired with one of these works correctly) and both workers can reach each other (ping/ssh). Connection between nimbus and these workers is fine (topology with workers=1 runs correctly on each of these workers).
I am not sure now what might be the issue over here. Any help is appreciated.
EDIT:
After spending a lot of time to figure this out, I came to know that connections on port 6700 were not being allowed on the worker node. I edited iptables to allow incoming tcp connections on the port. The worker logs still show some netty connection errors but now, at least, the topology is running fine.
sudo iptables -A INPUT -p tcp --dport 6700 -j ACCEPT