I have a rather strange issue and no matter what I do or search, I cannot rectify it and have now decided to seek help here initially.
I have a cluster of servers, about 10 clustered together.
A few weeks ago, one of the servers stopped communicating with the master server. Upon investigation, I found that the slave server could no longer ping the master but the master was still able to ping and communicate with the slave but they were no longer clustered.
I figured it was a firewall issue somewhere on the master and searched for the possible rule causing this. Eventually, I totally flushed the rules and re-wrote them again but even after the flush, the slave could still not ping the master and had 100% packet loss.
Upon further investigation, a weird entry was found on the MASTER in netstat -a showing the slave listening locally even though the entry should not be there.
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:4084 0.0.0.0:* LISTEN
tcp 0 0 xxxxx.xxxxxxxxx.xx:5941 0.0.0.0:* LISTEN
tcp 0 0 dex.xxxxxxx.co.u:domain 0.0.0.0:* LISTEN
The third entry is the slave and it is a remote server so should not be under Local Address if I am not mistaken. This seems to be the cause of the lock out as far as I can see. Here it is again.
]# netstat -ntlp | grep 9954
tcp 0 0 xx.99.1x7.x:53 0.0.0.0:* LISTEN
9954/dnsmasq
~]# netstat -ntlp | grep 53
tcp 0 0 xx.99.1x7.x:53 0.0.0.0:* LISTEN
I need assistance in removing this entry so the slave may be able to ping the master again. I have tried
tcpkill host xxx.xxxxxxxx.com
tcpkill host xx.99.1x7.x
but neither of these were able to remove the entry or allow ping.
Is there something I have missed? The master has not been rebooted as it is a production server. Any suggestions woud be highly appreciated.