I'm trying luck here to solve my problem happening on Google Cloud Kubernetes Engine.
Problem in short: When I upload file via my PHP application of 15-20MB, nginx ingress controller crashes, disk IO goes rapidly UP, then CPU goes up and takes about 5-30 minut until IO and CPU goes down and all sucessfully restarts.
Here are logs from nginx-ingress-controller containers of all what is happening with my comments:
Successfully received upload in app:
INFO 2020-02-14 14:30:55.481 CET 10.102.1.1 - [10.102.1.1] - - [14/Feb/2020:13:30:55 +0000] "POST /api/v1/contracts/38141/file-system/upload HTTP/2.0" 499 0
NGINX start to produces a tons of logs like this:
INFO 2020-02-14 14:30:55.819 CET *�I�g�*��\u001AnK67�@?+�(%u052f��O�yqq$+u$,�b�<*�9#\t��\u0003d\u0006+����I�]A�%u0110jv��hAp\"�63�9\u0019Q�{�x|K�\u000BE\u001C��\"-P%u0079�\u001Ed�Tv
After many lines there are logs about ingress endpoints are not available:
WARN 2020-02-14T13:31:05.505984Z Service "gitlab-managed-apps/ingress-nginx-ingress-default-backend" does not have any active Endpoint
WARN 2020-02-14 14:31:05.526 CET Service "my-app/my-app" does not have any active Endpoint.
WARN 2020-02-14 14:31:05.526 CET Service "my-app/app-staging" does not have any active Endpoint.
... skipped access logs ...
WARN 2020-02-14 14:32:34.419 CET failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded
2020-02-14 14:32:42.227 CET attempting to acquire leader lease gitlab-managed-apps/ingress-controller-leader-nginx...
ERROR 2020-02-14 14:32:43.464 CET Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again
Now is happening another file upload by client and again tons of logs of symbols... and after this log of symbols there is logged:
INFO 2020-02-14T13:33:37.525466Z Received SIGTERM, shutting down
INFO 2020-02-14T13:33:55.513100Z Received SIGTERM, shutting down
INFO 2020-02-14T13:33:55.513155Z Shutting down controller queues
INFO 2020-02-14T13:33:55.516017Z updating status of Ingress rules (remove)
ERROR 2020-02-14T13:33:55.570340Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
INFO 2020-02-14T13:33:55.574690Z Shutting down controller queues
INFO 2020-02-14T13:33:55.576049Z updating status of Ingress rules (remove)
ERROR 2020-02-14T13:33:55.610722Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:33:55.774881Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
INFO 2020-02-14T13:33:55.776321Z failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded
INFO 2020-02-14T13:33:55.781376Z attempting to acquire leader lease gitlab-managed-apps/ingress-controller-leader-nginx...
INFO 2020-02-14T13:33:56.826124Z successfully acquired lease gitlab-managed-apps/ingress-controller-leader-nginx
INFO 2020-02-14T13:33:56.833827Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh
ERROR 2020-02-14T13:33:56.933107Z queue has been shutdown, failed to enqueue: &ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,ManagedFields:[],}
INFO 2020-02-14T13:33:58.027600Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh
ERROR 2020-02-14T13:33:58.117920Z Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again
INFO 2020-02-14T13:33:59.709458Z Stopping NGINX process
INFO 2020-02-14T13:33:59.718181Z Stopping NGINX process
ERROR 2020-02-14T13:34:03.010148Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: dial unix /tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:12.627155Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:12.832624Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:13.693853Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:13.693930Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
INFO 2020-02-14T13:34:41.620594055Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620664183Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.620671154Z Release: 0.25.1
INFO 2020-02-14T13:34:41.620675964Z Build: git-5179893a9
INFO 2020-02-14T13:34:41.620681055Z Repository: https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.620686042Z nginx version: openresty/1.15.8.1
INFO 2020-02-14T13:34:41.620691348Z
INFO 2020-02-14T13:34:41.620695778Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620701128Z
INFO 2020-02-14T13:34:41.622564Z Watching for Ingress class: nginx
WARN 2020-02-14T13:34:41.622863Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
INFO 2020-02-14T13:34:41.623360607Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623418446Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.623425256Z Release: 0.25.1
INFO 2020-02-14T13:34:41.623426Z Watching for Ingress class: nginx
INFO 2020-02-14T13:34:41.623430244Z Build: git-5179893a9
INFO 2020-02-14T13:34:41.623435128Z Repository: https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.623441533Z nginx version: openresty/1.15.8.1
INFO 2020-02-14T13:34:41.623447006Z
INFO 2020-02-14T13:34:41.623451329Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623456382Z
WARN 2020-02-14T13:34:41.623731Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
ERROR 2020-02-14T13:34:41.629507140Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.633116Z Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
INFO 2020-02-14T13:34:41.633644Z Creating API client for https://10.103.0.1:443
ERROR 2020-02-14T13:34:41.640959117Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.642065Z Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
INFO 2020-02-14T13:34:41.642376Z Creating API client for https://10.103.0.1:443
INFO 2020-02-14T13:34:41.682018Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64
INFO 2020-02-14T13:34:41.700374Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64
There is able to see that nginx is (i don't know why) crashed and restarted.
My question is:
What could happen that nginx's healtcheck fail and pod is terminated? Can I somehow configure nginx-ingress about buffering to avoid this happens? Does it happen because of huge logging and disk fails? Or is it because it's buffering uploaded file in nginx and it takes too much time to respond to healthcheck? How to avoid it?
Here are my annotations of nginx-ingress what I already tried but it doesn't work with this annotations and also without them:
nginx.ingress.kubernetes.io/client-body-buffer-size: 5m
nginx.ingress.kubernetes.io/proxy-body-size: 15m
nginx.ingress.kubernetes.io/proxy-buffering: "on"
nginx.org/client-max-body-size: 15m
Technologies and versions:
Kubernetes master version 1.13.12-gke.25
Nodes 1.13.11-gke.14
Nginx-ingress-controller 0.25.1
Thank you for your help because I have no idea what to try more.