0

I am trying to implement a web proxy server in java that would relay requests and responses between my browser and the web. In the current set up, I get my browser to send all of the page requests to localhost on a specified port and my proxy is listening on that port for incoming requests.

The whole thing is threaded so that multiple requests can be handled at the same time and here's what my code looks like:

   private void startProxy(int serverPort){

    try {
        // create a socket to listen on browser requests
        ServerSocket servSocket = new ServerSocket(serverPort);

        while(true) {
            // create a thread for each connection
            ProxyThread thread = new ProxyThread(servSocket.accept());
            thread.start();
        }
    } catch (IOException e) {}
   }

   class ProxyThread extends Thread {

    private Socket client;
    private Socket server;

    public ProxyThread(Socket client) {
        this.client = client;
        server = new Socket();
    }

    public void run() {
        // passes on requests and responses here
    }

I have noticed that when I try to load a page with 20 different requests for html/css/js, sometimes only 18-19 threads are created, losing some requests in the process. Most often requests for a js resource or an image get dropped and they're never the last requests made by the browser, so it's not an issue of running out of resources.

Using wireshark, I am able to determine that the lost requests do get through to the localhost, so for some reason ServerSocket.accept() does not actually accept the connections. Are there any particular reasons why this might be happening? Or maybe my code is wrong in some way?

EDIT

Here is the body of run()

     try {
            BufferedReader clientOut = new BufferedReader(
                    new InputStreamReader(client.getInputStream()));
            OutputStream clientIn = client.getOutputStream();

            // assign default port to 80
            int port = 80;
            String request = "";
            // read in the first line of a HTTP request containing the url
            String subRequest = clientOut.readLine();
            String host = getHost(subRequest);

            // read in the rest of the request
            while(!subRequest.equals("")) {
              request += subRequest + "\r\n";
              subRequest = clientOut.readLine();
            }
            request += "\r\n";
            try {
                server.connect(new InetSocketAddress(host, port));
            } catch (IOException e) {
                String errMsg = "HTTP/1.0 500\nContent Type: text/plain\n\n" + 
                "Error connecting to the server:\n" + e + "\n";
                 clientIn.write(errMsg.getBytes());
                 clientIn.flush();
            }

            PrintWriter serverOut = new PrintWriter(server.getOutputStream(), true);
            serverOut.println(request);
            serverOut.flush();

            InputStream serverIn = server.getInputStream();

            byte[] reply = new byte[4096];
            int bytesRead;
            while ((bytesRead = serverIn.read(reply)) != -1) {
               clientIn.write(reply, 0, bytesRead);
               clientIn.flush();
            }


            serverIn.close();
            serverOut.close();

            clientOut.close();
            clientIn.close();

            client.close();
            server.close();
        } catch(IOException e){
            e.printStackTrace();
        }
kawaiinekochan
  • 212
  • 3
  • 12
  • 1
    Your code looks fine. (I don't know what other things to look at though) – user253751 Feb 16 '15 at 22:29
  • 1
    What exception was thrown? And if the requests get through, why haven't you posted the code that processes them? – user207421 Feb 16 '15 at 22:31
  • 2
    How do you come to the conclusion that only 18-19 threads are created ? – Jean-François Savard Feb 16 '15 at 22:33
  • accept() blocks until a connection is made. Are you just not waiting long enough until you determine it's not working? – mttdbrd Feb 16 '15 at 22:37
  • @EJP there are no exceptions, my problem is in that the browser is left waiting for responses to requests that never get to the proxy. The code that processes requests is the standard input/output stream that writes bytes back and forth – kawaiinekochan Feb 16 '15 at 22:45
  • 1
    How can you tell? With that code you cannot possibly know that. Your code ignores IOExceptions. And you need to clarify whether the requests do or not get through: you've stated both already; and you need to post the missing code. It may look standard to you, but it doesn't work. – user207421 Feb 16 '15 at 22:49
  • @Jean-FrançoisSavard I use print statements to see what thread is created for what request, plus wireshark lets me see exactly what goes in and out of my machine, so I can do a simple count of all the requests that my proxy makes – kawaiinekochan Feb 16 '15 at 22:49
  • 2
    In other words this isn't the real code. Waste of time posting it in that case. – user207421 Feb 16 '15 at 22:50
  • @EJP I added the thread body to the question. Some requests do get through and back to the browser, while a small portion of requests don't even make it to thread creation, hence why I think the problem is in the server socket – kawaiinekochan Feb 16 '15 at 23:09
  • So what exactly do you mean by 'requests that never get to the proxy'? No proxy code whatever can be expected to deal with requests that never arrive. – user207421 Feb 16 '15 at 23:23
  • NB the line terminator in HTTP is `\r\n`, not just `\n`; and your scheme of using a `BufferedReader` to read the headers and then reading the body via the underlying input stream won't work: you will lose data in the buffered reader. You've also completely forgotten about HTTP keep-alive, which is probably the real problem here. If you're going to implement HTTP you need to read RFC 2616. – user207421 Feb 16 '15 at 23:33
  • @EJP Say the browser makes a request for a html, a css and a js resource for a webpage. All three requests get to localhost, but only the html and the css get to server.accept() method, are processed and returned to the browser. The remaining js request is never accepted, even long after all other requests are dealt with. I'm trying to understand why that happens or if there is a way I can force the server socket to accept the js connection – kawaiinekochan Feb 16 '15 at 23:37
  • Starting a thread on each request is not the most efficient way to do that. I suspect you either get refused connects cause the accept queue is to short or some resource exhaust and since you dont print the exceptions you dont know about it. – eckes Feb 16 '15 at 23:39
  • What does 'all three requests get to localhost' actually mean? What do you see in the sniffer? Three SYN packets? Three HTTP GET requests? – user207421 Feb 16 '15 at 23:43
  • @eckes He isn't starting a thread per request. He is starting a thread per connection. 90% of TCP/IP servers work that way. The default backlog queue is 50, or even 500 in some platforms. – user207421 Feb 16 '15 at 23:45
  • @EJP Three HTTP get requests. I did try to increase the backlog queue as well but that didn't seem to make a difference – kawaiinekochan Feb 16 '15 at 23:51
  • How many SYN packets? – user207421 Feb 16 '15 at 23:56
  • @EJP for a webpage with 10 requests, I get 10 HTTP GET, 7 SYN and SYN, ACK with 7 requests successfully passing through the proxy and 3 getting stuck – kawaiinekochan Feb 17 '15 at 00:12
  • Well, the proxy does not parse more than the first URL, so it cannot handle keep-alive. And no not most servers start one thread per connection, they use thread pools. – eckes Feb 17 '15 at 21:49

1 Answers1

3

for a webpage with 10 requests, I get 10 HTTP GET, 6 SYN and SYN, ACK with 7 requests successfully passing through the proxy and 3 getting stuck.

So you have 6 separate connections but 10 requests, and you're only processing one request per connection. You've forgotten to implement HTTP keepalive. See RFC 2616. More than one request may arrive per connection. You need to read exactly as many bytes per request as are defined by the content-length header, or the sum of the chunks, whatever is present, if anything, and then instead of just closing the socket you need to go back and try to read another request. If that gives you end of stream, close the socket.

Or else send your response back to the client as HTTP 1.0, or with a Connection: close header, so it won't try to reuse the connection for another request.

user207421
  • 305,947
  • 44
  • 307
  • 483