0

We have a problem with a server on our ESX. All other machines operate normally, but not this one. It is the only Linux Server running on our ESX (all the others operate on Windows), and the only one having this problem.

It was installed 3 weeks ago and operated normally until last thursday. From this day on it started dropping connections to specific hosts randomly. For example, I am working with a web interface on the installed software and an open SSH connection (for viewing the logs). Suddenly my browser and my SSH connection are dropping with "Connection refused" and I am not able to reconnect, although ping is working. For my colleague, everything works. Later I am able to connect again and my colleague is not. It seems as if only 2-3 people are able to connect simultaneously to the server.

The server has got a static IP address and there is a static lease in our DNS (Microsoft Active Directory based).

Applied configurations during product installation:

ulimit -n 8800

echo "* soft stack 32768" >> /etc/security/limits.conf
echo "* hard stack 32768" >> /etc/security/limits.conf
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nproc 16384" >> /etc/security/limits.conf
echo "* hard nproc 16384" >> /etc/security/limits.conf

Firewall was turned off (service firewalld stop), this did not change anything. I am not seeing anything in the messages logfile.

Installed software:

  • Cent OS 7
  • IBM Business Process Server Advanced 8.5.6 (Based on IBM WebSphere)
  • IBM DB2 Express

I am a developer with basic network and Linux knowledge, but I am running out of ideas here. Are there any logs you would suggest me to check? How can I debug this system?

javahippie
  • 121
  • 7
  • What do your logs say ? – user9517 Mar 06 '17 at 08:01
  • In the /var/log/messages I am not seeing anything which is related (as described in the text). If the connections are refused, there are no events for the same timespan. I would be happy to attach logs, but I am not sure in which of the logs such events would be logged. Any suggestions? – javahippie Mar 06 '17 at 08:03
  • There are more logs than just messages. Connection Refused has a specific meaning https://serverfault.com/questions/725262/what-causes-the-connection-refused-message – user9517 Mar 06 '17 at 08:20
  • Which logs would be helpful here? I am not that common with CentOS, I would be happy to provide the logs here, but I am not sure which ones. Thank you for the link, but there is no active firewall on the server and there are listening processes - the application can be accessed by other clients, but not by everybody at the same time, as stated above. – javahippie Mar 06 '17 at 08:23
  • Well, *existing* connection cannot be dropped with "Connection refused", it's likely "Connection Reset". What happens with the *new* connection you're trying to establish during the outage - does it timeout or refused immediately? Anyway this behavior resembles an IP address conflict with some other network device to me.. – Peter Zhabin Mar 06 '17 at 15:50
  • @PeterZhabin thank you for your response. We shut down the server this morning, and found that "its" IP was still pingable.nmap tells me that the mac address of the mysterious device belongs to a Samsung device, so we assume that it's a phone... From here on our admins are responsible, this is not a misconfiguration of our machine. Thank you for the hint, if you retype it as an answer I would gladly accept it as an answer. – javahippie Mar 07 '17 at 08:20

1 Answers1

2

Well, existing connection cannot be dropped with "Connection refused", it's likely "Connection Reset". What happens with the new connection you're trying to establish during the outage - does it timeout or refused immediately? Anyway this behavior resembles an IP address conflict with some other network device to me..

Peter Zhabin
  • 2,696
  • 9
  • 10
  • To clarify, it was indeed an IP address conflict. The server had a static IP, which was in the dynamic address pool. Thank you for the idea and solution! – javahippie Mar 22 '17 at 13:23