In Kubernetes sysctl
have been grouped into safe
and unsafe
.
In addition to proper namespacing, a safe sysctl must be properly isolated between pods on the same node. This means that setting a safe sysctl for one pod
- must not have any influence on any other pod on the node
- must not allow to harm the node’s health
- must not allow to gain CPU or memory resources outside of the resource limits of a pod.
By far, most of the namespaced sysctls are not necessarily considered safe. The following sysctls are supported in the safe set:
kernel.shm_rmid_forced
,
net.ipv4.ip_local_port_range
,
net.ipv4.tcp_syncookies
.
By default all safe sysctls
are enabled by default.
All unsafe sysctls
are disabled and need to be allowed manually by cluster admin on each node.
kubelet --allowed-unsafe-sysctls \
'kernel.msg*,net.core.somaxconn' ...
For Minikube, this can be done via the extra-config
flag:
minikube start --extra-config="kubelet.allowed-unsafe-sysctls=kernel.msg*,net.core.somaxconn"...
Only namespaced sysctls can be enabled this way.
This is mentioned on Enabling Unsafe Sysctls k8s documentation.
As for, Setting Sysctls for a Pod:
A number of sysctls are namespaced in today’s Linux kernels. This means that they can be set independently for each pod on a node. Only namespaced sysctls are configurable via the pod securityContext within Kubernetes.
The following sysctls are known to be namespaced. This list could change in future versions of the Linux kernel.
- kernel.shm*
,
- kernel.msg*
,
- kernel.sem
,
- fs.mqueue.*
,
- The parameters under net.*
that can be set in container networking namespace. However, there are exceptions (e.g., net.netfilter.nf_conntrack_max
and net.netfilter.nf_conntrack_expect_max
can be set in container networking namespace but they are unnamespaced).
Sysctls with no namespace are called node-level sysctls. If you need to set them, you must manually configure them on each node’s operating system, or by using a DaemonSet with privileged containers.
Use the pod securityContext to configure namespaced sysctls. The securityContext applies to all containers in the same pod.
This example uses the pod securityContext to set a safe sysctl kernel.shm_rmid_forced
and two unsafe sysctls net.core.somaxconn
and kernel.msgmax
. There is no distinction between safe and unsafe sysctls in the specification.
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
spec:
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
- name: net.core.somaxconn
value: "1024"
- name: kernel.msgmax
value: "65536"
...
You may be interested in reading following questions on StackOverflow Pros and cons of disabling TCP timestamps and What benefit is conferred by TCP timestamp?.