Docker 50% performance hit on cpu intensive code

Question

I'm pretty new at using docker or any containers, so please be gentle if I've missed something obvious that everyone else be already knows. I've searched everywhere I can think of, but haven't seen this issue addressed.

I'm trying to evaluate the performance cost of running a benchmark in docker, and I discovered surprising large differences that don't make sense to me. I created a simple Docker image with this Dockerfile:

FROM ubuntu:18.04

RUN apt -y -q update && apt -y -q install python3 vim strace linux-tools-common \
        linux-tools-4.15.0-74-generic linux-cloud-tools-4.15.0-74-generic

ADD . /workspace
WORKDIR /workspace

And I've got a simple python script for testing:

$ cat cpu-test.py
#!/usr/bin/env python3

import math
from time import time

N = range(10)
N_i = range(1_000)
N_j = range(1_000)
x = 1

start = time()
for _ in N:
    for i in N_i:
        for j in N_j:
            x += -1**j * math.sqrt(i)/max(j,2)
stop = time()
print(stop-start)

and then I compare running it normally to running in a container:

$ ./cpu-test.py
4.077672481536865
$ docker run -it --rm cpu:test ./cpu-test.py
6.113868236541748
$

I was investigating it using perf, which led me to the discovery that I needed --privileged to run perf inside a docker, but then the performance gap disappeared:

$ docker run -it --rm --privileged cpu:test ./cpu-test.py
4.1469762325286865
$

Searching for anything to do with docker and --privileged mostly results in litanies of reasons that I shouldn't use privileged because of security considerations, haven't found anything about severe performance effects on mundane code.

Using perf to compare the with/without privilege runs, they look quite different:

With privilege, the top 5 in the perf report are:

     7.26%  docker   docker            [.] runtime.mapassign_faststr
     6.21%  docker   docker            [.] runtime.mapaccess2
     6.12%  docker   [kernel]          [k] 0xffffffff880015e0
     5.37%  docker   [kernel]          [k] 0xffffffff87faac87
     4.92%  docker   docker            [.] runtime.retake

while running without privilege results in:

    11.11%  docker   docker            [.] runtime.evacuate_faststr
     8.14%  docker   docker            [.] runtime.scanobject
     7.18%  docker   docker            [.] runtime.mallocgc
     5.10%  docker   docker            [.] runtime.mapassign
     4.44%  docker   docker            [.] runtime.growslice

I don't know if that is meaningful though, as I'm not at all familiar with the code of the docker runtime.

Am I doing something wrong? Or is there some special knob I need to turn?

Thanks

What is the version of Docker you are using? – Arnabjyoti Kalita Apr 04 '20 at 13:21 — Arnabjyoti Kalita, Apr 04 '20 at 13:21

Arnabjyoti Kalita · Answer 1 · 2020-04-08T05:29:25.297

The flag seccomp:unconfined when added to the docker run command improves the performance of the python program. seccomp is a linux kernel feature, which can be used to restrict the actions available inside the container, by way of allowing and disallowing certain system calls to be made to the host. This reduces the container's access to the host, and in security terminology, helps in reducing the attack surface of the container. The default seccomp profile disables 44 system calls for running containers including perf_event_open and when you add the flag --security-opt seccomp:unconfined all system calls are enabled for the running container.

Since adding seccomp:unconfined helps the Python program in running at almost 1.5x-2x speed, the first point of analysis would be to look at strace output and see if any system calls are slowing things down, when that flag is not added.

Output with --security-opt seccomp:unconfined flag

strace -c -f -S name docker run -it --rm --security-opt seccomp:unconfined cpu:test ./cpu-test.py

5.4090752601623535
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  2.00    0.000194          32         6         6 access
  0.11    0.000011          11         1           arch_prctl
  0.33    0.000032          11         3           brk
  0.00    0.000000           0         1           capget
  0.10    0.000010           1        16           clone
  0.64    0.000062           4        17           close
  0.00    0.000000           0         5         2 connect
  0.00    0.000000           0         1           epoll_create1
  0.00    0.000000           0        14         2 epoll_ctl
  0.22    0.000021           0        62           epoll_pwait
  0.29    0.000028          28         1           execve
  0.00    0.000000           0         8           fcntl
  0.67    0.000065           8         8           fstat
 68.87    0.006687          22       310        24 futex
  0.02    0.000002           2         1           getgid
  0.00    0.000000           0         3           getpeername
  0.00    0.000000           0         2           getpid
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         3           getsockname
  0.10    0.000010           1        17           gettid
  0.02    0.000002           1         2           getuid
  0.00    0.000000           0         5         1 ioctl
  0.00    0.000000           0         1           lseek
  5.83    0.000566           7        84           mmap
  2.12    0.000206           5        39           mprotect
  0.35    0.000034           2        14           munmap
  0.00    0.000000           0        12         9 newfstatat
  1.43    0.000139          10        14           openat
  0.13    0.000013          13         1           prlimit64
 10.21    0.000991          10       102           pselect6
  0.55    0.000053           2        34        10 read
  0.00    0.000000           0         1           readlinkat
  3.14    0.000305           3       120           rt_sigaction
  0.36    0.000035           1        53           rt_sigprocmask
  0.04    0.000004           4         1           sched_getaffinity
  2.04    0.000198           5        42           sched_yield
  0.18    0.000017           1        17           set_robust_list
  0.03    0.000003           3         1           set_tid_address
  0.00    0.000000           0         3           setsockopt
  0.22    0.000021           1        34           sigaltstack
  0.00    0.000000           0         5           socket
  0.00    0.000000           0         7           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.009709                  1072        54 total

Output without --security-opt seccomp:unconfined flag

strace -c -f -S name docker run -it --rm cpu:test ./cpu-test.py

8.161764860153198
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.08    0.000033           6         6         6 access
  0.04    0.000015          15         1           arch_prctl
  0.02    0.000007           2         3           brk
  0.00    0.000000           0         1           capget
  0.22    0.000087           6        15           clone
  0.26    0.000102           6        17           close
  0.04    0.000015           3         5         2 connect
  0.00    0.000000           0         1           epoll_create1
  0.14    0.000054           4        14         2 epoll_ctl
  2.31    0.000916          23        40           epoll_pwait
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         8           fcntl
  0.07    0.000027           3         8           fstat
 72.00    0.028580          99       290        21 futex
  0.01    0.000002           2         1           getgid
  0.01    0.000002           1         3           getpeername
  0.00    0.000000           0         2           getpid
  0.00    0.000000           0         1           getrandom
  0.01    0.000002           1         3           getsockname
  0.10    0.000039           2        16           gettid
  0.01    0.000002           1         2           getuid
  0.01    0.000005           1         5         1 ioctl
  0.00    0.000000           0         1           lseek
  1.33    0.000529           7        80           mmap
  0.72    0.000284           8        37           mprotect
  0.31    0.000125           8        15           munmap
  0.07    0.000026           2        12         9 newfstatat
  0.20    0.000080           6        14           openat
  0.01    0.000003           3         1           prlimit64
 20.04    0.007954          42       189           pselect6
  0.21    0.000085           3        34        10 read
  0.00    0.000000           0         1           readlinkat
  0.46    0.000182           2       120           rt_sigaction
  0.52    0.000207           4        50           rt_sigprocmask
  0.01    0.000004           4         1           sched_getaffinity
  0.27    0.000108           5        20           sched_yield
  0.11    0.000045           3        16           set_robust_list
  0.01    0.000003           3         1           set_tid_address
  0.01    0.000002           1         3           setsockopt
  0.32    0.000127           4        32           sigaltstack
  0.02    0.000008           2         5           socket
  0.09    0.000035           5         7           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.039695                  1082        51 total

Nothing significant yet. So the next thing that was to be analyzed was the Python program itself.

All of the below commands to obtain execution time profiles were run 5 times, and one from that sample space was chosen. There was very minimal variation in the timings.

Running the containers in the background, and then exec-ing into the container,

Output of doing a profile on the Python program running inside container with --security-opt seccomp:unconfined flag

docker exec -it cpu-test-seccomp bash
root@133453c7ccc6:/workspace# python3 -m cProfile ./cpu-test.py 

7.339433908462524
         20000069 function calls in 7.340 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:103(release)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:143(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:147(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:151(__exit__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:176(cb)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:232(_requires_builtin_wrapper)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:307(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:311(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:318(__exit__)
        4    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:321(<genexpr>)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:369(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:416(parent)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:424(has_location)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:433(spec_from_loader)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:504(_init_module_attrs)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:564(module_from_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:58(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:651(_load_unlocked)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:707(find_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:728(create_module)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:736(exec_module)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:753(is_package)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:78(acquire)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:843(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:847(__exit__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:870(_find_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:966(_find_and_load)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
        1    5.540    5.540    7.340    7.340 cpu-test.py:3(<module>)
        3    0.000    0.000    0.000    0.000 {built-in method _imp.acquire_lock}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.create_builtin}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.exec_builtin}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.is_builtin}
        3    0.000    0.000    0.000    0.000 {built-in method _imp.release_lock}
        2    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
        2    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        1    0.000    0.000    7.340    7.340 {built-in method builtins.exec}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
 10000000    1.228    0.000    1.228    0.000 {built-in method builtins.max}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
 10000000    0.571    0.000    0.571    0.000 {built-in method math.sqrt}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'rpartition' of 'str' objects}

Output of doing a profile on the Python program running inside container with no --security-opt flag

docker exec -it cpu-test-no-seccomp bash
root@500724539bd0:/workspace# python3 -m cProfile ./cpu-test.py 
11.848757982254028
         20000069 function calls in 11.849 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:103(release)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:143(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:147(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:151(__exit__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:176(cb)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:232(_requires_builtin_wrapper)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:307(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:311(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:318(__exit__)
        4    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:321(<genexpr>)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:369(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:416(parent)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:424(has_location)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:433(spec_from_loader)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:504(_init_module_attrs)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:564(module_from_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:58(__init__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:651(_load_unlocked)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:707(find_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:728(create_module)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:736(exec_module)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:753(is_package)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:78(acquire)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:843(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:847(__exit__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:870(_find_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:966(_find_and_load)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
        1    8.654    8.654   11.849   11.849 cpu-test.py:3(<module>)
        3    0.000    0.000    0.000    0.000 {built-in method _imp.acquire_lock}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.create_builtin}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.exec_builtin}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.is_builtin}
        3    0.000    0.000    0.000    0.000 {built-in method _imp.release_lock}
        2    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
        2    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        1    0.000    0.000   11.849   11.849 {built-in method builtins.exec}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
 10000000    2.155    0.000    2.155    0.000 {built-in method builtins.max}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
 10000000    1.039    0.000    1.039    0.000 {built-in method math.sqrt}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'rpartition' of 'str' objects}

The timings are slightly high in both the cases, because of the profiling overhead here. But two things are noticeable here -

The builtin math.sqrt and builtins.max functions show almost 1.5-2x differences in their execution time, this difference gets pronounced since these functions get called 10000000 times.
The resultant overall execution time is slower without the flag, as can be seen in the builtins.exec function and their execution times.

To understand this phenomenon more, the math.sqrt as well as the max functions were removed. The below line in cpu-test.py-

x += -1**j * math.sqrt(i)/max(j,2)

was changed to -

x += 1

and the import math line was removed too, thereby reducing a lot of overhead from the import statement.

With --security-opt seccomp:unconfined

root@133453c7ccc6:/workspace# python3 -m cProfile ./cpu-test.py 
0.7199039459228516
         8 function calls in 0.720 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
        1    0.720    0.720    0.720    0.720 cpu-test.py:4(<module>)
        1    0.000    0.000    0.720    0.720 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

With no --security-opt seccomp:unconfined

root@500724539bd0:/workspace# python3 -m cProfile ./cpu-test.py
1.0778992176055908
         8 function calls in 1.078 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
        1    1.078    1.078    1.078    1.078 cpu-test.py:4(<module>)
        1    0.000    0.000    1.078    1.078 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Also doing a perf record -e ./cpu-test.py after starting the container with --privileged flags, and then doing a perf report, we can see -

Samples: 20K of event 'cycles:ppp', Event count (approx.): 17551108136                                                                                                                                      
Overhead  Command  Shared Object      Symbol                                                                                                                                                                
  14.56%  python3  python3.6          [.] 0x0000000000181c0b
  11.65%  python3  python3.6          [.] _PyEval_EvalFrameDefault
   5.75%  python3  python3.6          [.] PyDict_GetItem
   3.43%  python3  python3.6          [.] PyDict_SetItem
   1.69%  python3  python3.6          [.] 0x0000000000181e45
   1.68%  python3  python3.6          [.] 0x0000000000181c23
   1.59%  python3  python3.6          [.] 0x00000000001705c9
   1.54%  python3  python3.6          [.] 0x0000000000181a88
   1.54%  python3  python3.6          [.] 0x0000000000181bfa
   1.48%  python3  python3.6          [.] 0x0000000000181c56
   1.48%  python3  python3.6          [.] 0x0000000000181c71
   1.42%  python3  python3.6          [.] 0x0000000000181c42
   1.37%  python3  python3.6          [.] 0x0000000000181c8a
   1.28%  python3  python3.6          [.] 0x0000000000181c01
   1.09%  python3  python3.6          [.] _PyObject_GC_New
   0.96%  python3  python3.6          [.] PyNumber_Multiply
   0.63%  python3  python3.6          [.] PyLong_AsDouble
   0.59%  python3  python3.6          [.] PyObject_GetAttr
   0.57%  python3  python3.6          [.] 0x00000000000c4df9
   0.57%  python3  python3.6          [.] 0x0000000000165808
   0.56%  python3  python3.6          [.] PyObject_RichCompare
   0.53%  python3  python3.6          [.] PyNumber_Negative

Most of the time is spent in _PyEval_EvalFrameDefault, which is a fair indication that most of the time is spent by the interpreter executing the byte code.

It would be fair to assume that addition of the --security-opt seccomp:unconfined speeds up the interpreter in executing the byte code. This would require a bit of digging around the Python internals.

Note that, the disassembled output is the same in both the cases, running with --seccomp:unconfined as well as using the default seccomp profile.

Interesting. The original workload that led me to this was actually in perl, which makes me wonder what the two interpreters might have in common that is so affected by the security setting. — OldGeeksGuide, Apr 06 '20 at 07:33

vaskozl · Answer 2 · 2021-07-06T18:56:17.680

As described by Yusuke Endoh in his blog, the heavy slowdown in scripting languages such as python, perl and ruby running under docker (and containerd) with the default seccomp profile appears to come from a kernel vulnerability mitigation called Speculative Store Bypass.

The mitigation suppresses indirect branch prediction (called STIBP). This makes most code much slower as measured by phoronix. The mitigation added in kernel 4.20, but was quickly disabled by default, due to is impact on performance.

To see if the mitigation is on for your kernel, run:

$ cat /sys/devices/system/cpu/vulnerabilities/spec_store_bypass
Mitigation: Speculative Store Bypass disabled via prctl and seccomp

As it turns out, running a program with seccomp enables the infamous STIBP mitigation, making some workloads, such as scripting languages two times slower.

Solutions to the problem (at the cost of security), include running with --security-opt seccomp:unconfined or adding kernel parameters to disable the relevant mitigation:

docker run --security-opt seccomp:unconfined
Booting the kernel with: mitigations=off
Booting the kernel with: spectre_v2_user=off spec_store_bypass_disable=off

Which solution is best depends entirely on the king of things running on the system. If you trust the code in the docker container completely but have untrusted VMs running on the system, you may opt for --security-opt seccomp:unconfined. Otherwise I would back Linus Torvalds and use the kernel parameters to disable the mitigation.

Other possible slowdowns (though not as significant) related to seccomp which one may still experience if running an old kernel/seccomp:

With seccomp 2.5, linux kernel 5.12 and mitigiations=off I am seeing above 99% native performance with docker, compared to less than 80% native performance with the default mitigation settings on Centos/Fedora.

score 2 · Answer 3 · answered Mar 31 '20 at 21:25

From this link:

When the operator executes docker run --privileged, Docker will enable access to all devices on the host as well as set some configuration in AppArmor or SELinux to allow the container nearly all the same access to the host as processes running outside containers on the host. Additional information about running with --privileged is available on the Docker Blog.

I feel that there are security restrictions which when running in privileged mode are practically disabled. I believe that the nature of these security restrictions tend to have performance cost when enabled however this performance cost for the sake of maintaining reasonable security. This would be very visible when running CPU intensive tasks like in your example.

score 1 · Answer 4 · answered Apr 19 '21 at 08:39

even if this question is now kind of old, I though it would still help some people to share our solution.

It seems that the real cause of the problem here is the libseccomp2 version installed on your machine. By upgrading it, apt-get install --only-upgrade libseccomp2, you will be able to enhance your application performances, and avoid setting --security-opt seccomp:unconfined when running your container.

Docker 50% performance hit on cpu intensive code

4 Answers4