Overview
I have a Go echo http server running with version 1.13.
$ go version
go version go1.13.7 linux/amd64
I'm monitoring a number of different statistics about the server, including the number of goroutines. I periodically see brief spikes of thousands of goroutines, when high load shouldn't cause it to exceed maybe a few hundred. These spikes do not correlate to an increase in http requests as logged by the labstack echo middleware.
To better debug this situation, I added a periodic check in the program which sends me a pprof report on the goroutines if the number spikes.
The added goroutines surprised me, as when the server is in "normal" operating mode, I see 0 goroutines of the listed functions.
goroutine profile: total 1946
601 @ 0x4435f0 0x4542e1 0x8f09dc 0x472c61
# 0x8f09db net/http.(*persistConn).readLoop+0xf0b /usr/local/go/src/net/http/transport.go:2027
601 @ 0x4435f0 0x4542e1 0x8f2943 0x472c61
# 0x8f2942 net/http.(*persistConn).writeLoop+0x1c2 /usr/local/go/src/net/http/transport.go:2205
601 @ 0x4435f0 0x4542e1 0x8f705a 0x472c61
# 0x8f7059 net/http.setRequestCancel.func3+0x129 /usr/local/go/src/net/http/client.go:321
What I'm struggling with, however, is where these are coming from, what they indicate, and at what point in an http request would I expect them.
To my untrained eye, it looks as if something is briefly attempting to open a connection the immediately tries to close it.
But it would be good to have confirmation of this. In what part of an http request do readLoop
, writeLoop
, and setRequestCancel
goroutines get started? What do these goroutines indicate?
Notes
A few things I've looked at:
- I tried adding middleware to capture requests frequencies from IP addresses as they came in, and report on those when the spikes happen. To total request number remains low, in the 30-40 range even as this spike is happening. No IP address is anomalous.
- I've considered executing something like
lsof
to find open connections but that seems like a tenuous approach at best, and relies on my understanding of what these goroutines mean. - I've tried to cross-correlate the timing of seeing this with other things on the network, but without understanding what could cause this, I can't make much sense of where the potential culprit may lie.
- If the number of goroutines exceeds 8192, the program crashes with the error:
race: limit on 8192 simultaneously alive goroutines is exceeded, dying
. A search for this error gets me to this github issue, which feels relevant because I am, in fact, using gorilla websockets in the program. However, the binary was compiled with-race
and no race condition is spit out along with my error, which is entirely different from the aforementioned question.