Context
I've encountered a scenario with NGINX where the server can be easily overwhelmed by my load testing scripts in the event that it is handling multiple connections to very slow upstream servers. This has lead me to wonder about the way that NGINX proxying handles upstream requests.
Load Testing
In this scenario, I am running Kubernetes on Azure with the ingress-nginx controller. I have 3 replicas of the ingress controller, each of which are running on a 2vCPU 4GiB machine (F2s_v2). I have deployed a simple Golang application that holds a connection open for 60 seconds, to simulate an application with very slow response times.
func longWait(w http.ResponseWriter, req *http.Request) {
fmt.Println("Looong function.")
time.Sleep(60 * time.Second)
_, _ = fmt.Fprintf(w, "Waited 60 seconds!\n")
}
func main() {
http.HandleFunc("/long", longWait)
fmt.Println("Listening on :8080.")
http.ListenAndServe(":8080", nil)
}
I used artillery.io to generate load on the webserver, using configuration like this
config:
target: "https://<fqdn>/timeout"
phases:
- duration: 15
arrivalRate: 5
rampTo: 50
name: Ramp up load quickly
- duration: 600
arrivalRate: 50
name: Sustained load
scenarios:
- name: "Load"
flow:
- get:
url: "/long"
When I perform the load test, the instances start to increase their CPU usage, until more replicas are spun up to assist through an HPA. If I set the arrival rate to 200, however, the instances will quickly exhaust the available CPU of each node, even when there are 3 replicas at the beginning of the test
Research
When trying to understand the behaviour that I saw, I found the following notes:
14.4 nginx Internals: Upstream and load balancers are also worth describing briefly. Upstreams are used to implement what can be identified as a content handler which is a reverse proxy (proxy_pass handler). Upstream modules mostly prepare the request to be sent to an upstream server (or "backend") and receive the response from the upstream server. There are no calls to output filters here. What an upstream module does exactly is set callbacks to be invoked when the upstream server is ready to be written to and read from.
Answer by Daniel: Nginx uses a single thread to read requests without blocking (epoll/kqueue), that's right. But the single thread looper blocks and waits for response from the backend python or php server, that's a known major drawback of Nginx.
The former would tend to indicate that NGINX is capable of registering callbacks for when an upstream request is ready to be handled (Returned to the original client), however, the latter seems to imply that despite NGINX being able to handle incoming clients in an event driven manner, subsequent calls to upstream (That may take a long time to respond) would block / incur CPU time when being handled.
Question
My question is, which of these behaviors is correct? And if the upstream requests are handled in an event driven way, why does the CPU usage of my instances rise so much when 'all the server is doing' is handling multiple long-lived requests?
Many thanks in advance