1

When attempting to run Apache inside a systemd container that was started with systemd-nspawn --private-users=pick ... (unlike --private-users=false in this solution) I encounter this error:

Permission denied: AH00072: make_sock: could not bind to address ...:999

What puzzles me is that the container has been granted capability CAP_NET_BIND_SERVICE (getpcaps 1 inside the container also confirms this), --capability=help indicates that the capability is supported, and netcat -l 999 -s ... (also inside the container) can apparently listen on the same port just fine.

What am I missing? Shouldn't the capability allow processes inside the container to open well-known ports on the host, no matter what their PIDs are?

UPDATE I made a mistake with the invocation of netcat. The correct command line is netcat -vl -p 999 -s ... and this now produces "Can't grab ...:999 with bind : Permission denied". So in fact neither Apache nor netcat can bind at this point, and this is not Apache-specific. Two more facts about the configuration: container is running as root (mapped to non-root pid on host), iptables on host are empty.

UPDATE So perhaps capability CAP_NET_BIND_SERVICE simply cannot transcend user namespaces.

rookie099
  • 375
  • 3
  • 14
  • Wild guess: are you using SELinux or AppArmor? Also, do you get any more information in the logs? What is your configuration? – Tommiie Nov 23 '20 at 15:24
  • @Tommiie No there is no SELinux or AppArmor. The host runs Debian 10. – rookie099 Nov 23 '20 at 15:30
  • 1
    When I check a new install of the latest Debian, it has AppArmor enabled by default. Please share configurations. – Tommiie Nov 23 '20 at 19:06
  • 1
    @Tommie Thanks for the hint re AppArmor. I was not aware of this. So yes, `aa-status` says apparmor module is loaded. Not sharing other configurations right now, because there are so many aspects to "configuration". – rookie099 Nov 24 '20 at 07:49

1 Answers1

1

I have come to the conclusion that capability CAP_NET_BIND_SERVICE specifically and capabilities in general cannot transcend user namespaces. From user_namespaces(7):

User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs (see credentials(7)), the root directory, keys (see keyrings(7)), and capabilities (see capabilities(7)).

...

The child process created by clone(2) with the CLONE_NEWUSER flagstarts out with a complete set of capabilities in the new user namespace. Likewise, a process that creates a new user namespace using unshare(2) or joins an existing user namespace using setns(2) gains a full set of capabilities in that namespace. On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of unshare(2) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace).

...

Having a capability inside a user namespace permits a process to perform operations (that require privilege) only on resources governed by that namespace. In other words, having a capability in a user namespace permits a process to perform privileged operations on resources that are governed by (nonuser) namespaces associated with the user namespace (see the next subsection).

From network_namespaces(7):

A physical network device can live in exactly one network namespace.

Since my case involves a binding to the host's primary IP address the relevant physical network device must presumably be governed by the kernel's root network namespace. The capability CAP_NET_BIND_SERVICE in my container's user namespace presumably has no bearing there since the root network namespace is not associated with that user namespace but with the kernel's root user namespace instead. So I am (presumably) out of luck here.

rookie099
  • 375
  • 3
  • 14
  • It sounds like this is the answer to your question so you can mark your own answer as the accepted solution. – Tommiie Nov 24 '20 at 18:15