1

There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.

I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?

If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?

This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.

Thanks so much.

Dustin Oprea
  • 9,673
  • 13
  • 65
  • 105

2 Answers2

4

An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.

An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.

An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.

The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.

Charlie Tumahai
  • 113,709
  • 12
  • 249
  • 242
  • An additional piece of the puzzle: the `request.Body.Close()` read the remaining request. If you want to discard during the reading, use `panci`. – leaf bebop Jan 20 '18 at 05:54
  • Yeah. I ran across `MaxBytesReader` in the server implementation (in the OPTIONS support, to mitigate overconsumption). Per `multipart.Reader`: "Reader is an iterator over parts in a MIME multipart body. Reader's underlying parser consumes its input as needed. Seeking isn't supported." However, the implementation is a `map`. How are you supposed to iterate over parts if all you have is a `map`? https://golang.org/pkg/mime/multipart/#Form – Dustin Oprea Jan 20 '18 at 06:07
  • I overlooked the multipart-reader part. Ordinarily the whole thing is parsed into memory/disk, but the multipart-reader produces one `multipart.Part` after another. Thanks. – Dustin Oprea Jan 20 '18 at 06:15
  • The [multipart.Reader example](https://godoc.org/mime/multipart#example-NewReader) shows how to iterate through the parts (you can ignore everything through to the NewReader call). – Charlie Tumahai Jan 20 '18 at 06:16
0

While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go

In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.

dustinevan
  • 918
  • 9
  • 21