1

I have been seeing some strange connection issue in the production environment.

The setup has two IBM Http Server's (IHS) and a network IP load-balancer in front of them (round-robin).

One instance the system is working fine, the next requests stop arriving at the IHS. Telnet directly to port 80 of the IHS is established sucessfully, but connection to the port 80 through the IP of the load-balancer fails!

The puzzle comes next, the network admins say the load-balancer is working fine. When we finally reboot the IHS servers and request start flowing...

The situation happened three times the last month and no obvious pattern was found.

Any debug ideas?

jpmartins
  • 1,404
  • 2
  • 12
  • 14
  • 1
    Have you sniffed any of the traffic to see what is going on, to determine where the problem is originating from (from the load balancer or the web server)? – Dave Drager Jan 29 '10 at 17:35
  • Thanks for the comment. It is a good troubleshooting idea, we will have to try it. We did not yet done traffic sniffing on the http servers because, with all the stress we forget it, this problem is quickly detected by the business and there is grate pressure (shouting) to do whatever to restore the system immediately (evolving chiefs). The load balancer side is out of our reach to sniff, another team manages it, so I believe sniffing the http server side alone will not be conclusive, when they answer about the state of load balancer the say is always working fine... – jpmartins Jan 30 '10 at 11:21

2 Answers2

1

You'd better sniff the traffic from the client, then you can detect the lag completely. Or sniff the client and server at the same time.

  • Thanks for the replies. I don't even remember how this situation was solved. Consider this answer as accepted. In the company sniffing traffic is restricted, there is a strong separation of duties witch sometimes makes it hard to really follow up all the problems. The enterprise is using ITIL, so Problem Management team should know how it was fixed :) – jpmartins Apr 15 '12 at 17:39
0

Either ARP issues or DHCP (perhaps a rogue DHCP server on the network..? some sort of self-assigning IP addresses?).

The load balancer might be fine but there might be something wrong between it and the http server. Three times a months means obviously a timeout issue (renew DHCP lease?).

lorenzog
  • 2,799
  • 3
  • 20
  • 24