1

Environment :

Openshift Container Platform - version 4.7

Pod description :

  1. Number of containers per pod - 3 (For simplicity, lets name it as A, B, C)

  2. Number of interfaces per pod - 3 (with the help of Multus - https://www.openshift.com/blog/demystifying-multus)

    virtio count   - 1
    sriov vf count - 2
    

    SRIOV VF is extracted from a sriov-dpdk CNI based network - https://github.com/openshift/sriov-cni#dpdk-userspace-driver-config

  3. First two containers(A and B) are not allotted with any SRIOV resources in the manifest :-

    resources:
      limits:
        cpu: 100m
        memory: 200Mi
      requests:
        cpu: 100m
        memory: 200Mi
    

    The third container(C) is allotted with the SRIOV resources :

    resources:
      limits:
        cpu: 200m
        memory: 500Mi
        openshift.io/sriov1:  1
        openshift.io/sriov2:  1
      requests:
        cpu: 200m
        memory: 500Mi
        openshift.io/sriov1:  1
        openshift.io/sriov2:  1
    

Problem description :

The first container(A), which was not allotted with the SRIOV VF in the container resource section (i.e., limits and request), gets allocated with SRIOV VFs, as shown below :

```
State:          Running
  Started:      Thu, 22 Jul 2021 10:38:35 +0000
Ready:          True
Restart Count:  0
Limits:
  cpu:                  100m
  memory:               200Mi
  openshift.io/sriov1:  1
  openshift.io/sriov2:  1
Requests:
  cpu:                  100m
  memory:               200Mi
  openshift.io/sriov1:  1
  openshift.io/sriov2:  1
Environment:
```

From within the container A, I could see the following output from the environment variables :

PCIDEVICE_OPENSHIFT_IO_SRIOV1=0000:5e:0a.4
PCIDEVICE_OPENSHIFT_IO_SRIOV2=0000:5e:0d.4

The third container(C) which was meant to be allocated with the SRIOV device, also gets a pair of SRIOV VFs as shown below :

```
Limits:
  cpu:                  200m
  memory:               500Mi
  openshift.io/sriov1:  1
  openshift.io/sriov2:  1
Requests:
  cpu:                  200m
  memory:               500Mi
  openshift.io/sriov1:  1
  openshift.io/sriov2:  1
Environment:
```

From within the third container(C), I could see the following content in the environment variables:

PCIDEVICE_OPENSHIFT_IO_SRIOV1=0000:5e:0a.3
PCIDEVICE_OPENSHIFT_IO_SRIOV2=0000:5e:0c.0

Which is completely different from what was allocated to the first container. Two separate set of PCI devices were allocated to the containers of the pod.

In addition to the above concern, the VFs allocated to the third container(C) isn't getting any traffic on the SRIOV interfaces.

Note :

I know, within a pod, all the containers share the same network namespace. But according to my understanding, the SRIOV VFs are allocated per container basis, similar to CPU, Memory, disk allotment.

Workaround :

By making adjustment in the order of the containers in the pod manifest, i.e., by making the C as the first container of the pod manifest, I could see that, only the first container(C) was allocated with the SRIOV VFs.

The SRIOV interfaces on the container C was now usable and we were able to run traffic.

Questions :

In the problematic scenario -

  1. Why is the first container(A) getting the VFs allocated to it even though no SRIOV resource was defined with it in its manifest.

  2. Why is the Third container(C), which is getting allocated with the container is not usable(not getting any traffic) ?

In the working scenario -

  1. Why does it work ?
  2. Why is the SRIOV resource allocation linked with the order of the container in the pod manifest ?

Thanks in advance for your response :)

  • can you help in understanding the problem with answers to 2 questions 1) are you running with `privellege mode` and /dev/` is shared to the container? 2) Are you running DPDK inside Container or on host or traffic? – Vipin Varghese Sep 20 '21 at 05:13

0 Answers0