We have 2 nodes, each with 96 GB RAM. The plan was that our pods will take 90.5 GB RAM from one of the nodes and 91 GB from the other. What actually happened was the pods took 93.5 GB from one of the nodes and 88 GB from the other. This caused the pods to just restart forever and the application never reached running state.
background: We are new to kubernetes and using version 1.14 on an eks cluster on AWS (v1.14.9-eks-658790). Currently we have pods of different sizes that together make 1 unit of our product. On the testing setup we want to work with 1 unit, and on production with many. It is a problem for us to pay more money for nodes, reduce the pod limits or the number of copies.
Details on the pods:
+-------------+--------------+-----------+-------------+
| Pod name | Mem requests | pod limit | # of copies |
+-------------+--------------+-----------+-------------+
| BIG-OK-POD | 35 | 46 | 2 |
| OK-POD | 7.5 | 7.5 | 4 |
| A-OK-POD | 6 | 6 | 8 |
| WOLF-POD | 5 | 5 | 1 |
| WOLF-B-POD | 1 | 1 | 1 |
| SHEEP-POD | 2 | 2 | 1 |
| SHEEP-B-POD | 2 | 2 | 1 |
| SHEEP-C-POD | 1.5 | 1.5 | 1 |
+-------------+--------------+-----------+-------------+
We don't care where the pods run, we just want the node to be able to handle the memory requirements without failing.
I renamed the pods to make it easier to follow what we expected.
Expected placement:
We expected the the wolf pods will be on one node, and the sheep pods on the other, while the OK pods will be splitted up between the nodes.
Node 1:
+-------------+-----------+-------------+----------------+
| Pod name | pod limit | # of copies | combined limit |
+-------------+-----------+-------------+----------------+
| BIG-OK-POD | 46 | 1 | 46 |
| OK-POD | 7.5 | 2 | 15 |
| A-OK-POD | 6 | 4 | 24 |
| WOLF-POD | 5 | 1 | 5 |
| WOLF-B-POD | 1 | 1 | 1 |
+-------------+-----------+-------------+----------------+
| | TOTAL: 91 |
+-------------+-----------+-------------+----------------+
Node 2:
+-------------+-----------+-------------+----------------+
| Pod name | pod limit | # of copies | combined limit |
+-------------+-----------+-------------+----------------+
| BIG-OK-POD | 46 | 1 | 46 |
| OK-POD | 7.5 | 2 | 15 |
| A-OK-POD | 6 | 4 | 24 |
| SHEEP-POD | 2 | 1 | 2 |
| SHEEP-B-POD | 2 | 1 | 2 |
| SHEEP-C-POD | 1.5 | 1 | 1.5 |
+-------------+-----------+-------------+----------------+
| | TOTAL: 90.5 |
+-------------+-----------+-------------+----------------+
Actual placement:
Node 1:
+-------------+-----------+-------------+----------------+
| Pod name | pod limit | # of copies | combined limit |
+-------------+-----------+-------------+----------------+
| BIG-OK-POD | 46 | 1 | 46 |
| OK-POD | 7.5 | 2 | 15 |
| A-OK-POD | 6 | 4 | 24 |
| WOLF-POD | 5 | 1 | 5 |
| SHEEP-B-POD | 2 | 1 | 2 |
| SHEEP-C-POD | 1.5 | 1 | 1.5 |
+-------------+-----------+-------------+----------------+
| | TOTAL: 93.5 |
+-------------+-----------+-------------+----------------+
Node 2:
+-------------+-----------+-------------+----------------+
| Pod name | pod limit | # of copies | combined limit |
+-------------+-----------+-------------+----------------+
| BIG-OK-POD | 46 | 1 | 46 |
| OK-POD | 7.5 | 2 | 15 |
| A-OK-POD | 6 | 4 | 24 |
| WOLF-B-POD | 1 | 1 | 1 |
| SHEEP-POD | 2 | 1 | 2 |
+-------------+-----------+-------------+----------------+
| | TOTAL: 88 |
+-------------+-----------+-------------+----------------+
Is there a way to tell kubernetes that the Node should leave 4 GB of memory to the node itself?
After reading Marc ABOUCHACRA answer, we tried changing the system-reserved=memory (which was set to 0.2Gi), but for any values higher than 0.3Gi (0.5Gi, 1Gi, 2Gi, 3Gi and 4Gi), pods were stuck on pending state forever.
Update: We found a way to reduce the limit on a few of the pods and now the system is up and stable (even though 1 of the nodes is on 99%). We couldn't get K8s to start with previews config and we still don't know why.