0

I work for some administration. We are responsible for app development, I'm personnaly responsible for the software application servers (Glassfish) and there is a team which manages the infrastructure (network, load balancer, oracle db, physical servers (Solaris on x86 machines)).

Now, sometimes, things get wrong and our application stops working. I try to take a lot of traces whenever there is a problem:

  • from the jvm running the appserver: jstack -l
  • from the system: prstat, pstack, vmstat, netstat

But, as it often happens in production, I don't have too much time and have to bounce the server asap.

Now, some demo server which isn't too much used crashed (saw it this morning, could be a couple of days ago). It seems to have the same symptoms as our unexplained app crash:

  • app server seems to wait for something (in production, it seems to be related to some db connection)
  • dba doesn't see anything on db side
  • cpu levels stay low, other things just work (web admin console, ...)

The oracle db connection uses an ldap lookup. On our demo server, it seems to be stuck waiting for an ldap connection to do something. Network team doesn't see any connection on the other side. Now, in netstat (from the zone I'm root), I definitely can see the connection as ESTABLISHED.

My question is: how can a connection become ESTABLISHED, sitting there, waiting for something, and why can't they see anything from the other side.

I'm guessing that, if this can happen for an ldap connection, it could happen to anything (db connection, ...).

The server is still in this state, so I can experiment on it (and give some more data) for some time (not too long).

ymajoros
  • 101
  • 3

1 Answers1

0

This is just a wild guess, but here goes:

If the LDAP connection is to a Windows server (e.g. to retrieve information from AD), then it might be waiting for a password. Windows AD does not allow anonymous LDAP queries, so a username and password must be provided. It is possible that the client is somehow misconfigured and is waiting for a user to provide the password. If this is the case, then the actual LDAP connection might have timed out (which is why the guys on the other side can't see anything). AFAIK, the default timeout for LDAP connections into Windows AD isn't all that long, only a few minutes.

wolfgangsz
  • 8,847
  • 3
  • 30
  • 34