20

For a program I'm making this function is ran as a goroutine in a for loop depending on how many urls are passed in (no set amount).

func makeRequest(url string, ch chan<- string, errors map[string]error){
  res, err := http.Get(url)
  if err != nil {
    errors[url] = err
    close(ch)
    return
  }

  defer res.Body.Close()
  body, _ := ioutil.ReadAll(res.Body)
  ch <- string(body)
}

The entire body of the response has to be used so ioutil.ReadAll seemed like the perfect fit but with no restriction on the amount of urls that can be passed in and the nature of ReadAll being that it's all stored in memory it's starting to feel less like the golden ticket. I'm fairly new to Go so if you do decide to answer, if you could give some explanation behind your solution it would be greatly appreciated!

  • 1
    What data is contained in the response body? JSON? A custom format? Data you need to store in a file or put in a database? How does the other side of ch handle it? – morganbaz Sep 27 '18 at 15:15
  • @Howl It could be whatever the user inputs for a url. Its normally just html. After ioutil.ReadAll body is a byte slice and obviously after string(body) it's the full html body as a string. It doesn't need to be stored anywhere I was just wondering if there was a more efficient way to get the full html bodies without loading all the bodies into memory. –  Sep 27 '18 at 15:23
  • What are you doing with them? Where would you store them other than in memory? – Adrian Sep 27 '18 at 17:12
  • 1
    If you need the full body you must read it. The number of URLs is not important as you are working serial on them. What is the actual problem? – Volker Sep 27 '18 at 18:26

3 Answers3

31

One insight that I got as I learned how to use Go is that ReadAll is often inefficient for large readers, and like in your case, is subject to arbitrary input being very big and possibly leaking out memory. When I started out, I used to do JSON parsing like this:

data, err := ioutil.ReadAll(r)
if err != nil {
    return err
}
json.Unmarshal(data, &v)

Then, I learned of a much more efficient way of parsing JSON, which is to simply use the Decoder type.

err := json.NewDecoder(r).Decode(&v)
if err != nil {
    return err
}

Not only is this more concise, it is much more efficient, both memory-wise and time-wise:

  • The decoder doesn't have to allocate a huge byte slice to accommodate for the data read - it can simply reuse a tiny buffer which will be used against the Read method to get all the data and parse it. This saves a lot of time in allocations and removes stress from the GC
  • The JSON Decoder can start parsing data as soon as the first chunk of data comes in - it doesn't have to wait for everything to finish downloading.

Now, of course your question has nothing to do with JSON, but this example is useful to illustrate that if you can use Read directly and parse data chunks at a time, do it. Especially with HTTP requests, parsing is faster than reading/downloading, so this can lead to parsed data being almost immediately ready the moment the request body finishes arriving.

In your case, you seem not to be actually doing any handling of the data for now, so there's not much to suggest to aid you specifically. But the io.Reader and the io.Writer interfaces are the Go equivalent of UNIX pipes, and so you can use them in many different places:

Writing data to a file:

f, err := os.Create("file")
if err != nil {
    return err 
}
defer f.Close()

// Copy will put all the data from Body into f, without creating a huge buffer in memory
// (moves chunks at a time)
io.Copy(f, resp.Body)

Printing everything to stdout:

io.Copy(os.Stdout, resp.Body)

Pipe a response's body to a request's body:

resp, err := http.NewRequest("POST", "https://example.com", resp.Body)
morganbaz
  • 2,997
  • 1
  • 17
  • 30
  • 1
    Would using bytes.Buffer and copying the response body into the buffer be a better option than ReadAll? Having trouble knowing which would be a better option, as the reader/writer paradigm is new to me. Could also use a scanner and scan each line into a bytes buffer and then use the String method on the buffer, but I would assume with both of those approaches all the bodies are still in memory –  Sep 27 '18 at 15:47
  • for example here's a new implementation using scanner and storing the output to a byte buffer. https://play.golang.org/p/P5Ey9DfSRsA I don't know how to tell which is more efficient. –  Sep 27 '18 at 15:53
  • `bytes.Buffer` still requires to allocate byte slices in memory as you go - so it doesn't really give much of an advantage over `ioutil.ReadAll` (although to make it more efficient, you can probably read the `Content-Length`, sanity check it, and then `bytes.NewBuffer(make([]byte, contentLength))`). Scanning is a better option if you can handle a line, then toss it away and go to the next line. It leads to worse performance instead if you need to place the line somewhere else in-memory (and handle it later). – morganbaz Sep 28 '18 at 08:21
  • In your example what does `&v` refer to? How do you define that? – Connor Leech Sep 27 '22 at 17:33
  • @ConnorLeech see the documentation for [json.Unmarshal](https://pkg.go.dev/encoding/json#Unmarshal) (Decoder.Decode refers to that, as it is the same.) Essentially, most of the time it should be a struct representing your go equivalent to the JSON structure, as per the examples of Unmarshal, but it can also be a simple `var v any`. – morganbaz Sep 27 '22 at 21:08
3

In order to bound the amount of memory that you're application is using, the common approach is to read into a buffer, which should directly address your ioutil.ReadAll problem.

go's bufio package offers utilities (Scanner) which supports reading until a delimiter, or reading a line from the input, which is highly related to @Howl's question

dm03514
  • 54,664
  • 18
  • 108
  • 145
  • So for example, this could be used to stop reading once the body of the html is reached if I only needed the data from meta tags etc? Sounds perfect, but I'm curious to know if I filled a byte buffer using scanner with the entire response body, would this not be exactly the same as ReadAll? Here's an example: https://play.golang.org/p/P5Ey9DfSRsA –  Sep 27 '18 at 16:11
3

While that is pretty much simple in go

Here is the client program:

package main

import (
    "fmt"
    "net/http"
)

var data []byte

func main() {
    data = make([]byte, 128)

    ch := make(chan string)

    go makeRequest("http://localhost:8080", ch)

    for v := range ch {
        fmt.Println(v)
    }
}

func makeRequest(url string, ch chan<- string) {
    res, err := http.Get(url)
    if err != nil {
        close(ch)
        return
    }
    defer res.Body.Close()
    defer close(ch) //don't forget to close the channel as well

    for n, err := res.Body.Read(data); err == nil; n, err = res.Body.Read(data) {
        ch <- string(data[:n])
    }
}

Here is the serve program:

package main

import (
    "net/http"
)

func main() {
    http.HandleFunc("/", hello)
    http.ListenAndServe("localhost:8080", nil)
}

func hello(w http.ResponseWriter, r *http.Request) {
    http.ServeFile(w, r, "movie.mkv")
}
nilsocket
  • 1,441
  • 9
  • 17