While toying around with an example from user_namespaces(7), I've come across a strange behaviour.
What the application does
The application user-ns-ex
calls clone(2) with CLONE_NEWUSER, thus creating a new process in a new user namespace. The parent process writes a map (0 1000 1
) to /proc//uid_map file and tells (via a pipe) the child that it can proceed. The child process then execs bash
.
I've copied the source code here.
The problem
The application opens /proc//uid_map for writing if I either set it no capabilites or all of them.
When I set only set_capuid,set_capgid and optionally cap_sys_admin the call to open(2) fails:
Set caps:
arksnote linux-namespaces # setcap 'cap_setuid,cap_setgid,cap_sys_admin=epi' ./user-ns-ex
arksnote linux-namespaces # getcap ./user-ns-ex
./user-ns-ex = cap_setgid,cap_setuid,cap_sys_admin+eip
Try to run:
kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19666
ERROR: open /proc/19666/uid_map: Permission denied
About to exec bash
And now a successfull case:
No capabilities:
arksnote linux-namespaces # setcap '=' ./user-ns-ex
arksnote linux-namespaces # getcap ./user-ns-ex
./user-ns-ex =
Runs Ok:
kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19557
About to exec bash
arksnote linux-namespaces # exit
I've been trying to find the reason in man-pages and playing with different capabilities but with no luck as of this moment. What puzzles me the most, is that the application runs with less capabilities and does not with more.
Can someone help me and clarify the issue?