The flag seccomp:unconfined
when added to the docker run
command improves the performance of the python program. seccomp
is a linux kernel feature, which can be used to restrict the actions available inside the container, by way of allowing and disallowing certain system calls to be made to the host. This reduces the container's access to the host, and in security terminology, helps in reducing the attack surface
of the container. The default seccomp
profile disables 44 system calls for running containers including perf_event_open
and when you add the flag --security-opt seccomp:unconfined
all system calls are enabled for the running container.
Since adding seccomp:unconfined
helps the Python program in running at almost 1.5x-2x speed, the first point of analysis would be to look at strace
output and see if any system calls are slowing things down, when that flag is not added.
- Output with
--security-opt seccomp:unconfined
flag
strace -c -f -S name docker run -it --rm --security-opt seccomp:unconfined cpu:test ./cpu-test.py
5.4090752601623535
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
2.00 0.000194 32 6 6 access
0.11 0.000011 11 1 arch_prctl
0.33 0.000032 11 3 brk
0.00 0.000000 0 1 capget
0.10 0.000010 1 16 clone
0.64 0.000062 4 17 close
0.00 0.000000 0 5 2 connect
0.00 0.000000 0 1 epoll_create1
0.00 0.000000 0 14 2 epoll_ctl
0.22 0.000021 0 62 epoll_pwait
0.29 0.000028 28 1 execve
0.00 0.000000 0 8 fcntl
0.67 0.000065 8 8 fstat
68.87 0.006687 22 310 24 futex
0.02 0.000002 2 1 getgid
0.00 0.000000 0 3 getpeername
0.00 0.000000 0 2 getpid
0.00 0.000000 0 1 getrandom
0.00 0.000000 0 3 getsockname
0.10 0.000010 1 17 gettid
0.02 0.000002 1 2 getuid
0.00 0.000000 0 5 1 ioctl
0.00 0.000000 0 1 lseek
5.83 0.000566 7 84 mmap
2.12 0.000206 5 39 mprotect
0.35 0.000034 2 14 munmap
0.00 0.000000 0 12 9 newfstatat
1.43 0.000139 10 14 openat
0.13 0.000013 13 1 prlimit64
10.21 0.000991 10 102 pselect6
0.55 0.000053 2 34 10 read
0.00 0.000000 0 1 readlinkat
3.14 0.000305 3 120 rt_sigaction
0.36 0.000035 1 53 rt_sigprocmask
0.04 0.000004 4 1 sched_getaffinity
2.04 0.000198 5 42 sched_yield
0.18 0.000017 1 17 set_robust_list
0.03 0.000003 3 1 set_tid_address
0.00 0.000000 0 3 setsockopt
0.22 0.000021 1 34 sigaltstack
0.00 0.000000 0 5 socket
0.00 0.000000 0 7 write
------ ----------- ----------- --------- --------- ----------------
100.00 0.009709 1072 54 total
- Output without
--security-opt seccomp:unconfined
flag
strace -c -f -S name docker run -it --rm cpu:test ./cpu-test.py
8.161764860153198
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.08 0.000033 6 6 6 access
0.04 0.000015 15 1 arch_prctl
0.02 0.000007 2 3 brk
0.00 0.000000 0 1 capget
0.22 0.000087 6 15 clone
0.26 0.000102 6 17 close
0.04 0.000015 3 5 2 connect
0.00 0.000000 0 1 epoll_create1
0.14 0.000054 4 14 2 epoll_ctl
2.31 0.000916 23 40 epoll_pwait
0.00 0.000000 0 1 execve
0.00 0.000000 0 8 fcntl
0.07 0.000027 3 8 fstat
72.00 0.028580 99 290 21 futex
0.01 0.000002 2 1 getgid
0.01 0.000002 1 3 getpeername
0.00 0.000000 0 2 getpid
0.00 0.000000 0 1 getrandom
0.01 0.000002 1 3 getsockname
0.10 0.000039 2 16 gettid
0.01 0.000002 1 2 getuid
0.01 0.000005 1 5 1 ioctl
0.00 0.000000 0 1 lseek
1.33 0.000529 7 80 mmap
0.72 0.000284 8 37 mprotect
0.31 0.000125 8 15 munmap
0.07 0.000026 2 12 9 newfstatat
0.20 0.000080 6 14 openat
0.01 0.000003 3 1 prlimit64
20.04 0.007954 42 189 pselect6
0.21 0.000085 3 34 10 read
0.00 0.000000 0 1 readlinkat
0.46 0.000182 2 120 rt_sigaction
0.52 0.000207 4 50 rt_sigprocmask
0.01 0.000004 4 1 sched_getaffinity
0.27 0.000108 5 20 sched_yield
0.11 0.000045 3 16 set_robust_list
0.01 0.000003 3 1 set_tid_address
0.01 0.000002 1 3 setsockopt
0.32 0.000127 4 32 sigaltstack
0.02 0.000008 2 5 socket
0.09 0.000035 5 7 write
------ ----------- ----------- --------- --------- ----------------
100.00 0.039695 1082 51 total
Nothing significant yet.
So the next thing that was to be analyzed was the Python program itself.
All of the below commands to obtain execution time profiles were run 5 times, and one from that sample space was chosen. There was very minimal variation in the timings.
Running the containers in the background, and then exec
-ing into the container,
- Output of doing a profile on the Python program running inside container with
--security-opt seccomp:unconfined
flag
docker exec -it cpu-test-seccomp bash
root@133453c7ccc6:/workspace# python3 -m cProfile ./cpu-test.py
7.339433908462524
20000069 function calls in 7.340 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:103(release)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:143(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:147(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:151(__exit__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:176(cb)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:222(_verbose_message)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:232(_requires_builtin_wrapper)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:307(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:311(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:318(__exit__)
4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:321(<genexpr>)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:369(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:416(parent)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:424(has_location)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:433(spec_from_loader)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:504(_init_module_attrs)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:564(module_from_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:58(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:651(_load_unlocked)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:707(find_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:728(create_module)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:736(exec_module)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:753(is_package)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:78(acquire)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:843(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:847(__exit__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:870(_find_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:966(_find_and_load)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
1 5.540 5.540 7.340 7.340 cpu-test.py:3(<module>)
3 0.000 0.000 0.000 0.000 {built-in method _imp.acquire_lock}
1 0.000 0.000 0.000 0.000 {built-in method _imp.create_builtin}
1 0.000 0.000 0.000 0.000 {built-in method _imp.exec_builtin}
1 0.000 0.000 0.000 0.000 {built-in method _imp.is_builtin}
3 0.000 0.000 0.000 0.000 {built-in method _imp.release_lock}
2 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
2 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
1 0.000 0.000 0.000 0.000 {built-in method builtins.any}
1 0.000 0.000 7.340 7.340 {built-in method builtins.exec}
4 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
5 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
10000000 1.228 0.000 1.228 0.000 {built-in method builtins.max}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
10000000 0.571 0.000 0.571 0.000 {built-in method math.sqrt}
2 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}
- Output of doing a profile on the Python program running inside container with no
--security-opt
flag
docker exec -it cpu-test-no-seccomp bash
root@500724539bd0:/workspace# python3 -m cProfile ./cpu-test.py
11.848757982254028
20000069 function calls in 11.849 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:103(release)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:143(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:147(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:151(__exit__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:176(cb)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:222(_verbose_message)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:232(_requires_builtin_wrapper)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:307(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:311(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:318(__exit__)
4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:321(<genexpr>)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:369(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:416(parent)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:424(has_location)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:433(spec_from_loader)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:504(_init_module_attrs)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:564(module_from_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:58(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:651(_load_unlocked)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:707(find_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:728(create_module)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:736(exec_module)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:753(is_package)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:78(acquire)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:843(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:847(__exit__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:870(_find_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:966(_find_and_load)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
1 8.654 8.654 11.849 11.849 cpu-test.py:3(<module>)
3 0.000 0.000 0.000 0.000 {built-in method _imp.acquire_lock}
1 0.000 0.000 0.000 0.000 {built-in method _imp.create_builtin}
1 0.000 0.000 0.000 0.000 {built-in method _imp.exec_builtin}
1 0.000 0.000 0.000 0.000 {built-in method _imp.is_builtin}
3 0.000 0.000 0.000 0.000 {built-in method _imp.release_lock}
2 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
2 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
1 0.000 0.000 0.000 0.000 {built-in method builtins.any}
1 0.000 0.000 11.849 11.849 {built-in method builtins.exec}
4 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
5 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
10000000 2.155 0.000 2.155 0.000 {built-in method builtins.max}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
10000000 1.039 0.000 1.039 0.000 {built-in method math.sqrt}
2 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
2 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}
The timings are slightly high in both the cases, because of the profiling overhead here. But two things are noticeable here -
The builtin math.sqrt
and builtins.max
functions show almost 1.5-2x differences in their execution time, this difference gets pronounced since these functions get called 10000000 times.
The resultant overall execution time is slower without the flag, as can be seen in the builtins.exec
function and their execution times.
To understand this phenomenon more, the math.sqrt
as well as the max
functions were removed. The below line in cpu-test.py
-
x += -1**j * math.sqrt(i)/max(j,2)
was changed to -
x += 1
and the import math
line was removed too, thereby reducing a lot of overhead from the import
statement.
- With
--security-opt seccomp:unconfined
root@133453c7ccc6:/workspace# python3 -m cProfile ./cpu-test.py
0.7199039459228516
8 function calls in 0.720 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
1 0.720 0.720 0.720 0.720 cpu-test.py:4(<module>)
1 0.000 0.000 0.720 0.720 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
2 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
- With no
--security-opt seccomp:unconfined
root@500724539bd0:/workspace# python3 -m cProfile ./cpu-test.py
1.0778992176055908
8 function calls in 1.078 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
1 1.078 1.078 1.078 1.078 cpu-test.py:4(<module>)
1 0.000 0.000 1.078 1.078 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
2 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Also doing a perf record -e ./cpu-test.py
after starting the container with --privileged flags
, and then doing a perf report
, we can see -
Samples: 20K of event 'cycles:ppp', Event count (approx.): 17551108136
Overhead Command Shared Object Symbol
14.56% python3 python3.6 [.] 0x0000000000181c0b
11.65% python3 python3.6 [.] _PyEval_EvalFrameDefault
5.75% python3 python3.6 [.] PyDict_GetItem
3.43% python3 python3.6 [.] PyDict_SetItem
1.69% python3 python3.6 [.] 0x0000000000181e45
1.68% python3 python3.6 [.] 0x0000000000181c23
1.59% python3 python3.6 [.] 0x00000000001705c9
1.54% python3 python3.6 [.] 0x0000000000181a88
1.54% python3 python3.6 [.] 0x0000000000181bfa
1.48% python3 python3.6 [.] 0x0000000000181c56
1.48% python3 python3.6 [.] 0x0000000000181c71
1.42% python3 python3.6 [.] 0x0000000000181c42
1.37% python3 python3.6 [.] 0x0000000000181c8a
1.28% python3 python3.6 [.] 0x0000000000181c01
1.09% python3 python3.6 [.] _PyObject_GC_New
0.96% python3 python3.6 [.] PyNumber_Multiply
0.63% python3 python3.6 [.] PyLong_AsDouble
0.59% python3 python3.6 [.] PyObject_GetAttr
0.57% python3 python3.6 [.] 0x00000000000c4df9
0.57% python3 python3.6 [.] 0x0000000000165808
0.56% python3 python3.6 [.] PyObject_RichCompare
0.53% python3 python3.6 [.] PyNumber_Negative
Most of the time is spent in _PyEval_EvalFrameDefault
, which is a fair indication that most of the time is spent by the interpreter executing the byte code.
It would be fair to assume that addition of the --security-opt seccomp:unconfined
speeds up the interpreter in executing the byte code. This would require a bit of digging around the Python
internals.
Note that, the disassembled output is the same in both the cases, running with --seccomp:unconfined
as well as using the default seccomp profile.