5

I'm have created an application which sends GET requests to a URL, and then downloads the full content of that page.

The client sends a GET to e.g. stackoverflow.com, and forwards the response to a parser, which has the resposibility to find all the sources from the page that needs to be downloaded with subsequent GET requests.

The method below is used to send those GET requests. It is called many times consecutively, with the URLs returned by the parser. Most of those URLs are located on the same host, and should be able to share the TCP connection.

public static void sendGetRequestToSubObject(String RecUrl)
    {
        URL url = new URL(recUrl.toString());
        URLConnection connection = url.openConnection ();
        InputStreamReader isr = new InputStreamReader(connection.getInputStream());
    }

Each time this method is called, a new TCP connection is created (with a TCP 3-way handshake) and the GET is then sent on that connection. But I want to reuse the TCP connections, to improve performance.

I guess that since I create a new URL object each time the method is called, this is the way it going to work...

Maybe someone can help me do this in a better way?

Thanks!

plithner
  • 325
  • 4
  • 14

2 Answers2

6

HttpURLConnection will reuse connections if it can!

For this to work, several preconditions need to be fulfilled, mostly on the server side. Those preconditions are described in the article linked to above.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • Thanks! I have have read that document, and I think the preconditions are fulfilled. I know that the server fulfills them, because I have tried accessing the URLS with several different browsers (IE, Chrome, FIrefox) and in all cases the TCP connection is reused for several HTTP GETs. – plithner Mar 28 '11 at 16:59
  • I still end up thinking that I'm doing something wrong in the code... Like the fact that for each URL I create a new URL object. It does not feel right, but I have not been able to do it any other way. – plithner Mar 28 '11 at 17:00
  • @user: since `URL` objects can't be modified, you have to create a new one each time, so that's certainly not the problem. – Joachim Sauer Mar 28 '11 at 18:31
  • Sorry for commenting on my own post like this... but I realize that what you (Joachim) was referring to was this: "but the underlying network connection to the HTTP server may be transparently shared by other instances". I guess that creating a new instance for each GET would be OK then. Any other ideas what I might be doing wrong? I Have tried adding connection.setRequestProperty("Connection","keep-alive"); But it does not help. – plithner Mar 28 '11 at 20:38
  • Some more information: I found that when I access a certain URL (www.itunes.com) I can see that some GETs are sent on the same TCP connection. Great news, but only a few of the GETs are sent this way. When using a "real" browser to access the same URL, several more requests are sent on the same connection. The end result is that my client takes several times longer to download the page. – plithner Mar 29 '11 at 07:50
  • Is there any solution for this scenario ? – SkyEagle888 Feb 21 '14 at 07:15
2

Found the problem! I was not reading the input stream properly. This caused the input stream objects to hang, and they could not be reused.

I only defined it, like this:

InputStreamReader isr = new InputStreamReader(connection.getInputStream());

but I never read from it :-)

I changed the read method as well. Instead of a buffered reader I stole this:

InputStream in = null; 
String queryResult = "";
try {
     URL url = new URL(archiveQuery);
     HttpURLConnection urlConn = (HttpURLConnection) url.openConnection();
     HttpURLConnection httpConn = (HttpURLConnection) urlConn;
     httpConn.setAllowUserInteraction(false);
     httpConn.connect();
     in = httpConn.getInputStream();
     BufferedInputStream bis = new BufferedInputStream(in);
     ByteArrayBuffer baf = new ByteArrayBuffer(50);
     int read = 0;
     int bufSize = 512;
     byte[] buffer = new byte[bufSize];
     while(true){
         read = bis.read(buffer);
         if(read==-1){
           break;
         }
         baf.append(buffer, 0, read);
     }
     queryResult = new String(baf.toByteArray());
     } catch (MalformedURLException e) {
          // DEBUG
          Log.e("DEBUG: ", e.toString());
     } catch (IOException e) {
          // DEBUG
          Log.e("DEBUG: ", e.toString());
     } 
}

From here: Reading HttpURLConnection InputStream - manual buffer or BufferedInputStream?

Community
  • 1
  • 1
plithner
  • 325
  • 4
  • 14