1

Is it possible to create a bind mount to the parent namespace when creating a container?

I have code that does the following operations (error checking stripped):

struct clone_args cloneArguments;
memset(&cloneArguments, 0, sizeof(cloneArguments));
cloneArguments.flags = CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWUSER;
pid_t child = syscall(SYS_clone3, &cloneArguments, sizeof(cloneArguments));
if(child){
    // Go do parent things
    return;
}

mkdir("container", S_IRWXU);
mount(NULL, "container", "tmpfs", 0, NULL);
mkdir("container/bin", S_IRWXU);
copyFile("busybox", "container/bin", S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
mkdir("container/lib", S_IRWXU);
mount("/usr/lib", "container/lib", NULL, MS_BIND, NULL);
mkdir("container/oldRoot", S_IRWXU);
syscall(SYS_pivot_root, "container", "container/oldRoot");
chdir("/");
umount2("/oldRoot", MNT_DETACH);
remove("/oldRoot");
mount(NULL, "/", NULL, MS_REAMOUNT | MS_BIND | MS_RDONLY, nullptr);

child = fork();
if(child){
    waitpid(child, NULL, 0);
}else{
    // Busybox checks arg0 to see what applet it uses, not passing the binary as the first arg is not a bug in this context.
    execl("/bin/busybox", "/bin/find", "/", NULL);
    exit(errno);
}

All of the error checking passes and busybox runs. It prints out all of the files in the container minus the ones from the host in /usr/lib. It prints find: /lib: Value too large for defined data type to stderr.

Is there a way to allow this to work with bind mounts? The old root still exists because of the parent namespace, otherwise my machine would pretty quickly crash and the error makes no sense to me.

gudenau
  • 500
  • 5
  • 19
  • Why the C tag? That's not C code. – kaylum Dec 30 '21 at 05:34
  • Puedo code, the project is using C. I just figured writing it out like this would be a little easier to read. I'll replace it with C in a little bit. – gudenau Dec 30 '21 at 05:51
  • Have you tried allocating a stack in the parameters passed to clone3() ? – Rachid K. Dec 30 '21 at 07:52
  • From the manual page: The stack for the child process is specified via cl_args.stack, which points to the lowest byte of the stack area, and cl_args.stack_size, which specifies the size of the stack in bytes. In the case where the CLONE_VM flag (see below) is specified, a stack must be explicitly allocated and specified. Otherwise, these two fields can be specified as NULL and 0, which causes the child to use the same stack area as the parent (in the child's own virtual address space). – gudenau Dec 30 '21 at 23:40
  • If you strace `find`, which syscall is it that's returning `EOVERFLOW`? – Joseph Sible-Reinstate Monica Jan 01 '22 at 21:42
  • Unsure, it's one of the ones that busybox is using in it's find applet. – gudenau Jan 03 '22 at 02:38

0 Answers0