Environment :
Openshift Container Platform - version 4.7
Pod description :
Number of containers per pod - 3 (For simplicity, lets name it as A, B, C)
Number of interfaces per pod - 3 (with the help of Multus - https://www.openshift.com/blog/demystifying-multus)
virtio count - 1 sriov vf count - 2
SRIOV VF is extracted from a sriov-dpdk CNI based network - https://github.com/openshift/sriov-cni#dpdk-userspace-driver-config
First two containers(A and B) are not allotted with any SRIOV resources in the manifest :-
resources: limits: cpu: 100m memory: 200Mi requests: cpu: 100m memory: 200Mi
The third container(C) is allotted with the SRIOV resources :
resources: limits: cpu: 200m memory: 500Mi openshift.io/sriov1: 1 openshift.io/sriov2: 1 requests: cpu: 200m memory: 500Mi openshift.io/sriov1: 1 openshift.io/sriov2: 1
Problem description :
The first container(A), which was not allotted with the SRIOV VF in the container resource section (i.e., limits and request), gets allocated with SRIOV VFs, as shown below :
```
State: Running
Started: Thu, 22 Jul 2021 10:38:35 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 200Mi
openshift.io/sriov1: 1
openshift.io/sriov2: 1
Requests:
cpu: 100m
memory: 200Mi
openshift.io/sriov1: 1
openshift.io/sriov2: 1
Environment:
```
From within the container A, I could see the following output from the environment variables :
PCIDEVICE_OPENSHIFT_IO_SRIOV1=0000:5e:0a.4
PCIDEVICE_OPENSHIFT_IO_SRIOV2=0000:5e:0d.4
The third container(C) which was meant to be allocated with the SRIOV device, also gets a pair of SRIOV VFs as shown below :
```
Limits:
cpu: 200m
memory: 500Mi
openshift.io/sriov1: 1
openshift.io/sriov2: 1
Requests:
cpu: 200m
memory: 500Mi
openshift.io/sriov1: 1
openshift.io/sriov2: 1
Environment:
```
From within the third container(C), I could see the following content in the environment variables:
PCIDEVICE_OPENSHIFT_IO_SRIOV1=0000:5e:0a.3
PCIDEVICE_OPENSHIFT_IO_SRIOV2=0000:5e:0c.0
Which is completely different from what was allocated to the first container. Two separate set of PCI devices were allocated to the containers of the pod.
In addition to the above concern, the VFs allocated to the third container(C) isn't getting any traffic on the SRIOV interfaces.
Note :
I know, within a pod, all the containers share the same network namespace. But according to my understanding, the SRIOV VFs are allocated per container basis, similar to CPU, Memory, disk allotment.
Workaround :
By making adjustment in the order of the containers in the pod manifest, i.e., by making the C as the first container of the pod manifest, I could see that, only the first container(C) was allocated with the SRIOV VFs.
The SRIOV interfaces on the container C was now usable and we were able to run traffic.
Questions :
In the problematic scenario -
Why is the first container(A) getting the VFs allocated to it even though no SRIOV resource was defined with it in its manifest.
Why is the Third container(C), which is getting allocated with the container is not usable(not getting any traffic) ?
In the working scenario -
- Why does it work ?
- Why is the SRIOV resource allocation linked with the order of the container in the pod manifest ?
Thanks in advance for your response :)