12
tr := &http.Transport{
    TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
client := &http.Client{Transport: tr}
response, err := client.Get(link)
if err != nil {
    fmt.Println(err)
}
defer response.Body.Close()

//block forever at the next line
content, _ = ioutil.ReadAll(response.Body)

The above is my code to read content from a webpage which resides in a loop. I found sometimes the line ioutil.ReadAll(response.Body) will block forever. This happens randomly, however, it almost always happens on this webpage: http://xkcd.com/55 . It's very interesting that when I do curl http://xkcd.com/55, it returns nothing, however, wget http://xkcd.com/55 returns the whole webpage.

  • 1
    Maybe they are redirect: see my download function to follow those, with a JarCookie included: https://github.com/VonC/senvgo/blob/bf74db02b675bb36e0213bfdc68d6750c5bf944f/main.go#L1929-L1979 – VonC May 31 '14 at 07:43
  • 1
    I just tested downloading http://xkcd.com/55, and it works just fine (with my version of http code) – VonC May 31 '14 at 07:53
  • did you notice that you are using a `https` client on an `http` connection ? – fabrizioM May 31 '14 at 16:57
  • 1
    you should maybe add a timeout for your download so you will not get stuck if a server is indeed slow, and the problem is not within your code. – Not_a_Golfer May 31 '14 at 17:51
  • 1
    Re: VonC's comment, note that http://xkcd.com/55 redirects to http://xkcd.com/55/ with the slash, so if this code doesn't work with redirects (I don't actually know), yes, you'd be hosed here. – twotwotwo May 31 '14 at 21:30

4 Answers4

8

I suspect your problem is that you try to read the response body even if there's an error:

if err != nil {
    fmt.Println(err)
}

You should either have an else after this, or you should return or continue or something else. Your ReadAll() line is undefined behavior.

(If you originally copied this from the Get() example code, note that it includes a log.Fatalf() in the error leg, which terminates the program.)

I suspect that, as you say, occasionally you are getting a network error for one reason or another. Are you checking the output for the result of the Println()? The way you've done it, I could imagine it easily getting buried in the output.

As @twotwotwo notes, this URL returns a redirect to the same URL with a trailing slash. Get() will automatically handle this for you, so that's not the problem. By default curl does not follow redirects while wget does. You can see the header information by passing -i to curl.


Other things to verify:

  • Make sure your defer is actually being called. Remember, defer is called at the end of the function, not the end of the current scope. So if you're in a loop (as you mention), you will just accumulate defer blocks and never actually close these responses.

  • If the server in fact never closes the connection, then io.ReadAll() will never return. This is a feature. If you want a timeout, you need to handle that yourself. You should be able to test this hypothesis with tools like curl. For some solutions, see:

Community
  • 1
  • 1
Rob Napier
  • 286,113
  • 34
  • 456
  • 610
  • Thanks @Rob, yes, it's a good point. I put in a lot of log and found out the err didn't happen here. However, it indeed blocked at ioutil.ReadAll(response.Body). What's interesting is that it never blocks at the first request. It will almost always block at the second or third request. It looks like the when the server detected multiple requests in a short time, the server treats the second of third incoming connections differently from the first one. I totally understand the server may use some strategy to protect it from fraud. However I hope it also will not block forever. –  Jun 01 '14 at 14:28
7

Additionally, avoid read response Body in ReadAll without memory/buffer limits control, example:

googleResponse := GoogleResponse{}
err = json.NewDecoder(io.LimitReader(resp.Body, MAX_MEMORY)).Decode(&googleResponse)
if err != nil {
    return nil, err
}

Read more about it in good blog posts:
Crossing Streams: a Love Letter to io.Reader by Jason Moiron
ioutil.ReadAll(httpResponse.Body) memory consumption
Golang Slices And The Case Of The Missing Memory

Community
  • 1
  • 1
Vitalii Velikodnyi
  • 1,312
  • 11
  • 18
1

Your code should work as expected. I am guessing, its a network issue. Try setting a higher timeout.

package main

import (
    "crypto/tls"
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {

    link := "http://xkcd.com/55"

    tr := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    }
    client := &http.Client{Transport: tr}
    response, err := client.Get(link)
    if err != nil {
        fmt.Println(err)
    }
    defer response.Body.Close()

    //block forever at the next line
    content, _ := ioutil.ReadAll(response.Body)

    fmt.Println(string(content))

}
GoLang Master
  • 259
  • 5
  • 5
0

I probably have found the solution by adding DisableKeepAlives: true, to the `&http.Transport, like this:

tr := &http.Transport{
    TLSClientConfig:   &tls.Config{InsecureSkipVerify: true},
    DisableKeepAlives: true,
}

Since I made this change, I haven't encountered any long blocking, yet. But I'm not 100% sure this is the solution. I will leave the new code running one day or two. If there will be no blocking, I think this problem is solved.

  • So far it's not blocking yet. However, the longest wait was about ~15 minutes long. I'm wondering how to set a timeout to `ioutil.ReadAll(response.Body)`. –  Jun 02 '14 at 04:46
  • It turned out this is not the solution. It blocks forever again. It looks like I still need to find a way to set a timeout on this line `ioutil.ReadAll(response.Body)`. –  Jun 02 '14 at 09:01