1

Overview

I have a Go echo http server running with version 1.13.

$ go version
go version go1.13.7 linux/amd64

I'm monitoring a number of different statistics about the server, including the number of goroutines. I periodically see brief spikes of thousands of goroutines, when high load shouldn't cause it to exceed maybe a few hundred. These spikes do not correlate to an increase in http requests as logged by the labstack echo middleware.

To better debug this situation, I added a periodic check in the program which sends me a pprof report on the goroutines if the number spikes.

The added goroutines surprised me, as when the server is in "normal" operating mode, I see 0 goroutines of the listed functions.

goroutine profile: total 1946
601 @ 0x4435f0 0x4542e1 0x8f09dc 0x472c61
#       0x8f09db        net/http.(*persistConn).readLoop+0xf0b  /usr/local/go/src/net/http/transport.go:2027

601 @ 0x4435f0 0x4542e1 0x8f2943 0x472c61
#       0x8f2942        net/http.(*persistConn).writeLoop+0x1c2 /usr/local/go/src/net/http/transport.go:2205

601 @ 0x4435f0 0x4542e1 0x8f705a 0x472c61
#       0x8f7059        net/http.setRequestCancel.func3+0x129   /usr/local/go/src/net/http/client.go:321

What I'm struggling with, however, is where these are coming from, what they indicate, and at what point in an http request would I expect them.

To my untrained eye, it looks as if something is briefly attempting to open a connection the immediately tries to close it.

But it would be good to have confirmation of this. In what part of an http request do readLoop, writeLoop, and setRequestCancel goroutines get started? What do these goroutines indicate?

Notes

A few things I've looked at:

  • I tried adding middleware to capture requests frequencies from IP addresses as they came in, and report on those when the spikes happen. To total request number remains low, in the 30-40 range even as this spike is happening. No IP address is anomalous.
  • I've considered executing something like lsof to find open connections but that seems like a tenuous approach at best, and relies on my understanding of what these goroutines mean.
  • I've tried to cross-correlate the timing of seeing this with other things on the network, but without understanding what could cause this, I can't make much sense of where the potential culprit may lie.
  • If the number of goroutines exceeds 8192, the program crashes with the error: race: limit on 8192 simultaneously alive goroutines is exceeded, dying. A search for this error gets me to this github issue, which feels relevant because I am, in fact, using gorilla websockets in the program. However, the binary was compiled with -race and no race condition is spit out along with my error, which is entirely different from the aforementioned question.
JimB
  • 104,193
  • 13
  • 262
  • 255
MirroredFate
  • 12,396
  • 14
  • 68
  • 100
  • Those are just idle connections. – JimB Nov 05 '20 at 01:59
  • @JimB Is the `setRequestCancel` call part of the creation of the idle connections? Does go periodically instantiate thousands of idle connections? I would imagine they would need to be triggered by something external.... – MirroredFate Nov 05 '20 at 16:58
  • You can see the source for `setRequestCancel`, and the documentation explains the use case; the associated goroutine will be around for the duration of the request. Without an example I can't say what is triggering your server to make these requests, but the transport is expected to maintain idle connections to be re-used later. – JimB Nov 05 '20 at 17:07
  • @JimB Thanks for the help, the key wording that it was the server making the requests as the origin of the idle connections caused a lot of pieces to crystalize. I'd had it in my head that these were incoming requests to the server, not outgoing ones. If you'd like to write up a brief answer about that, I'll mark it as accepted. I solved my problem as soon as I thought about looking for requests the server was making instead of ones it was receiving. – MirroredFate Nov 05 '20 at 23:29

0 Answers0