18

My java webservice running on Jetty falls over after a period of a few hours and investigation indicate many sockets in CLOSE_WAIT status. Whilst it is working ok there seems to be no sockets in CLOSE_WAIT status but when it goes wrong there are loads.

I found this definition

CLOSE-WAIT: The local end-point has received a connection termination request and acknowledged it e.g. a passive close has been performed and the local end-point needs to perform an active close to leave this state.

With netstat on my server I see a list of tcp sockets in CLOSE_WAIT status, the local address is my server and the foreign address my load balancer machine. So I assume this means the client (load balancer) has just terminated the connection at its end in some improper way, and my server has not properly closed the connection at its end.

But how do I do that, my Java code doesn't deal with low level sockets ?

Or is the load-balancer terminating connection because of an earlier problem caused by something my server is doing wrong in the code.

Ren P
  • 929
  • 9
  • 20
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • The question marked as duplicate appears similar but the solution is not helpful because it indicates the issue is with the client, but we dont have control over clients only over the server so we need a way for the server to cope even if the client is doing something wrong. – Paul Taylor Mar 05 '15 at 10:50
  • The issue isn't with the client, it's with the server, or rather with whichever end shows CLOSE_WAIT. @Kayaman Not really a duplicate, as the other one is about clients with this condition, and has a client-only solution. – user207421 Mar 05 '15 at 11:20
  • @EJP oh, okay I dont understand how I can fix this I have a doGet(HttpServletRequest request, HttpServletResponse response) method in my servlet, which many do response.redirect(), response.sendError() or more usually PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(response.getOutputStream(), CHARSET))); writer.write(out, results, responseFormat, isPretty); out.close(); and then return . I dont directly deal with sockets so how do I fix this ? – Paul Taylor Mar 05 '15 at 11:25
  • In fact further reading make its clear that it is not the responsibility of the code to close the writer associated with a HttpResponse, that is the responsiblity of the servlet container. As these CLOSE_WAITS must be associated with a HttpRequest (not an ordinary file) I dont understand how my code could possibly cause this issue ? – Paul Taylor Mar 05 '15 at 12:13
  • 4
    I think you have to cause and effect backwards. The sockets in CLOSE_WAIT are a symptom of the server falling over, not the reason it fell over. – David Schwartz Mar 05 '15 at 16:23
  • @David Schwartz you are probably right but can you explain further - I dont get it – Paul Taylor Mar 05 '15 at 22:29
  • @PaulTaylor The server fails over, and because the server has failed, CLOSE_WAIT sockets build up. So the question is why the server fails over, not why CLOSE_WAIT sockets build up. – David Schwartz Mar 05 '15 at 22:58
  • But isn't it only the server that can put a socket in CLOSE_WAIT state, or can that be done by client ? – Paul Taylor Mar 06 '15 at 09:15
  • 1
    It is only the *client* sending a FIN that can produce the CLOSE_WAIT, and if it persists it is caused by the server *failing* to do something, i.e. close the socket. @DavidSchwartz Is right, you have this all back to front. – user207421 Mar 07 '15 at 18:36
  • Take a thread dump (kill -3) to see what Jetty is busy doing. If you do this twice, when the server is healthy and also when it's not, then you might have enough info to figure out what's wrong. – jtoberon Mar 07 '15 at 22:21
  • @EJP ok that is useful to know. What Im not getting is it seems unlikley that Jetty would cause the problem, if it did there would be lots of complaints and it would get fixed. But I dont understand how my code can cause the problem because the closing of connection between a client and server.is handled by Jetty, I just see HttpResponse and HttpRequest objects and its not even my codes responsibility to close them either. – Paul Taylor Mar 08 '15 at 17:37

6 Answers6

7

Sounds like a bug in Jetty or JVM, maybe this workaround will work for you: http://www.tux.hk/index.php?entry=entry090521-111844

Add the following lines to /etc/sysctl.conf

net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 1800

And then execute

sysctl -p

or do a reboot

Eirenliel
  • 345
  • 4
  • 14
5

I suspect this could be something causing a long or infinite loop/infinite wait in your server code, and Jetty simply never gets a chance to close the connection (unless there's some sort of timeout that forcibly closes the socket after a certain period). Consider the following example:

public class TestSocketClosedWaitState
{
    private static class SocketResponder implements Runnable
    {
        private final Socket socket;

        //Using static variable to control the infinite/waiting loop for testing purposes, with while(true) Eclipse would complain of dead code in writer.close() -line
        private static boolean infinite = true;

        public SocketResponder(Socket socket)
        {
            this.socket = socket;
        }       

        @Override
        public void run()
        {
            try
            {               
                PrintWriter writer = new PrintWriter(socket.getOutputStream()); 
                writer.write("Hello");              

                //Simulating slow response/getting stuck in an infinite loop/waiting something that never happens etc.
                do
                {
                    Thread.sleep(5000);
                }
                while(infinite);

                writer.close(); //The socket will stay in CLOSE_WAIT from server side until this line is reached
            }
            catch(Exception e)
            {
                e.printStackTrace();
            }           

            System.out.println("DONE");
        }
    }

    public static void main(String[] args) throws IOException
    {
        ServerSocket serverSocket = new ServerSocket(12345);

        while(true)
        {
            Socket socket = serverSocket.accept();
            Thread t = new Thread(new SocketResponder(socket));
            t.start();
        }       
    }
}

With the infinite-variable set to true, the Printwriter (and underlying socket) never gets closed due to infinite loop. If I run this and connect to the socket with telnet, then quit the telnet-client, netstat will show the server side-socket still in CLOSE_WAIT -state (I could also see the client-side socket in FIN_WAIT2-state for a while, but it'll disappear):

~$ netstat -anp | grep 12345
tcp6       0      0 :::12345        :::*            LISTEN      6460/java       
tcp6       1      0 ::1:12345       ::1:34606       CLOSE_WAIT  6460/java   

The server-side accepted socket gets stuck in the CLOSE_WAIT -state. If I check the thread stacks for the process, I can see the thread waiting inside the do...while -loop:

~$ jstack 6460

<OTHER THREADS>

"Thread-0" prio=10 tid=0x00007f424013d800 nid=0x194f waiting on condition [0x00007f423c50e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at TestSocketClosedWaitState$SocketResponder.run(TestSocketClosedWaitState.java:32)
    at java.lang.Thread.run(Thread.java:701)

<OTHER THREADS...>

If I set the infinite-variable to false, and do the same (connect client & disconnect), the socket with CLOSE_WAIT -state will show until the writer is closed (closing the underlying socket), and then disappears. If the writer or socket is never closed, the server-side socket will again get stuck in CLOSED_WAIT, even if the thread terminates (I don't think this should occur in Jetty, if your method returns at some point, Jetty probably should take care of closing the socket).

So, steps I'd suggest you to try and find the culprit are

  • Add logging to your methods to see where there are going/what they are doing
  • Check your code, are there any places where the execution could get stuck in an infinite loop or take a really long while, preventing the underlying socket from being closed?
  • If it still occurs, take a thread dump from the running Jetty-process with jstack the next time this problem occurs and try to identify any "stuck" threads
  • Is there a chance something might throw something (OutOfMemoryError or such) that might not get caught by the underlying Jetty-architecture calling your method? I've never peeked inside Jetty's internals, it could very well be catching Throwables, so this is probably not the issue, but maybe worth checking if all else fails

You could also name the threads when they enter and exit your methods with something like

        String originalName = Thread.currentThread().getName();
        Thread.currentThread().setName("myMethod");

        //Your code...

        Thread.currentThread().setName(originalName);

to spot them easier if there are a lot of threads running.

esaj
  • 15,875
  • 5
  • 38
  • 52
  • I got a jstack when it stopped responding, dunno if anything weird about it https://gist.github.com/mayhem/5f53a3c55e8e110cdb9f – Paul Taylor Mar 13 '15 at 16:16
  • and this is when things are ok https://gist.github.com/mayhem/60be5a2134dc3daf8fd0 with the first one there does some seem to be various code thread in BLOCKED state - but what is blocking. When its working all we seem to have is polling threads waiting to new client connections I assume. – Paul Taylor Mar 13 '15 at 16:27
  • @PaulTaylor: Couldn't find anything else from those myself either, except for the first one showing everything in BLOCKED state (even when there's nothing like wait() -calls or such). Hard to say exactly what could cause it, maybe stop-the-world -garbage collection running frequently/non-stop? Is the CPU usage high when it stops responding? Anything out of the ordinary in the tomcat logs? Does it occur sooner if JVM is given less memory and later if it's given more? Could there be a memory leak somewhere? Just guessing really... – esaj Mar 14 '15 at 17:12
5

We have the same problem in our project. I'm not sure that this is your case, but maybe it will be helpful.

The reason was that a huge number of requests was handled by business logic with synchronized block. So when the client sent packets to drop connection, the thread bound to this socket was busy, waiting for monitor.

The logs show exceptions for org.eclipse.jetty.io.WriteFlusher at write method:

DEBUG org.eclipse.jetty.io.WriteFlusher - write - write exception
org.eclipse.jetty.io.EofException: null
    at org.eclipse.jetty.io.ChannelEndPoint.flush
(ChannelEndPoint.java:192) ~[jetty-io-9.2.10.v20150310.jar:9.2.10.v20150310]

and for org.eclipse.jetty.server.HttpOutput at close method. I think that exception at close step is the reason of sockets' CLOSE_WAIT state:

DEBUG org.eclipse.jetty.server.HttpOutput - close -
org.eclipse.jetty.io.EofException: null
    at org.eclipse.jetty.server.HttpConnection$SendCallback.reset
(HttpConnection.java:622) ~[jetty-server-9.2.10.v20150310.jar:9.2.10.v20150310]

The fast solution in our case was to increase idleTimeout. The right solution (again in our case) is code refactoring.

So my advice is to carefully read Jetty's logs with DEBUG level to find exceptions and analyze application performance with VisualVM. Maybe the reason is performance bottleneck (synchronized blocks?).

Vitalii Ivanov
  • 732
  • 1
  • 7
  • 19
1

I faced a similar problem, while the culprit code may differ, the symptoms were 1) Server (Jetty) was running yet not processing request 2) There was not extra ordinary load/exceptions 3) Too many CLOSE_WAIT connections were there.

These suggested that all the worker threads in the server are stuck somewhere. Jstack Thread dump showed that all our worker threads were stuck in apache HttpClient object. (because of unclosed response objects), and since all the threads were waiting infinitely, none were available to process the incoming request.

Tony Hinkle
  • 4,706
  • 7
  • 23
  • 35
-1

Is the load balancer still up? Try stopping the load balancer and see if this is the issue not the server.

  • Trailer Well I think the issue is the load-balancer, but everyone tells me it must me my server because the CLOSE_WAITS are on my server and therefore the problem is my server, but if load balancer not functioning could that prevent server moving out of CLOSE_WAIT connection. An answer to more basic http question might help me, a client send a http request to server, server then sends response. What happens next, does the client implicitly initialise close of connection when it receives response from server, or does server initiate close of connection after sending response ? – Paul Taylor Mar 11 '15 at 15:23
  • Depends on the server. If it's using keep-alive then generally speaking its up to the client to close it when finished. That could be your problem. But a CLOSE_WAIT state can last for seconds, sometimes minutes. Does the state not change after waiting a few minutes? You might be looking at something that's not actually the problem. – TV Trailers Mar 11 '15 at 21:30
  • No it doesnt change, when all is working fine we dont notice any CLOSE_WAITS but when the server stops responding we notice loads of CLOSE_WAITS piling up. – Paul Taylor Mar 11 '15 at 22:00
  • Sounds like the load balancer is holding onto sockets. Just a guess. – TV Trailers Mar 11 '15 at 22:50
  • Thats what I think, please help me understand then can the loadbalancer prevent server closing socket then. – Paul Taylor Mar 12 '15 at 10:49
  • It isn't the load balancer. It is the local application where the CLOSE-WAITS are. If it was the load balancer not closing the connection, the CLOSEWAIT wouldn't occur, it would stay ESTABLISHED. – user207421 Mar 14 '15 at 07:11
-2

This probably means you're not cleaning up your incoming connections. Make sure sockets are getting closed at the end of each transaction. (Best done in a finally block near the beginning of your server code so that connections get closed even if server side exceptions occur.)

Alex Fitzpatrick
  • 643
  • 5
  • 10
  • yes, but I dont deal with sockets directly, just HttpResponse and HttpResponse objects, and closing the associated connection is the responsiblity of the servlet container (jetty/tomcat) – Paul Taylor Mar 12 '15 at 10:54
  • As someone else pointed out, this may be a bug in Jetty. Another thing to investigate is socket timeouts and keep-alive: http://serverfault.com/questions/420205/configure-keep-alive-timeout-on-jetty-6-1-19 – Alex Fitzpatrick Mar 12 '15 at 16:25
  • Actually we moved over to Tomcat but still seeing the same issue – Paul Taylor Mar 13 '15 at 10:06