1

I have a requirement. My process has to fork->exec another process during one of its code paths. The child process runs some checks and when some condition is true it has to re-exec itself. It did not cause any performance issues when I tested on high end machines.

But will it be an expensive to call execv() again in the same process? Especially when it is exec()ing itself?

Note: There is no fork() involved for the second time. The process would just execv() itself for the second time, to get something remapped in its virtual address space.

Bose
  • 381
  • 3
  • 16
  • As a side note remember that the parent (now grandparent) will only see that the child process has exited but will not have the PID of the grandchild process so won't know to reap it unless it does a blind `wait()` or `waitpid()` without a specific PID – inetknght Sep 25 '15 at 15:19
  • 1
    Second time I am not doing the fork, it will just exec itself, without fork, to get something remapped in its virtual address space. So I would assume pid would be the same – Bose Sep 25 '15 at 15:22
  • What kind of program, of process? Wat approximate size of executable & application? Is your case pathological? – Basile Starynkevitch Sep 25 '15 at 16:22
  • @BasileStarynkevitch the size of the executable is ~500K – Bose Sep 25 '15 at 18:02
  • How often do you expect to `execve`? Twice a second, or every millisecond? – Basile Starynkevitch Sep 25 '15 at 18:06
  • @BasileStarynkevitch Totally twice, no more than that. Once by parent when it forks the child and does execv() and once inside the child which does execv() itself upon some condition. So totally twice. – Bose Sep 25 '15 at 18:24
  • I was asking about the *frequency* when you are doing that when running your app during a few minutes. Doing half a dozen of `execve` is harmless ... – Basile Starynkevitch Sep 25 '15 at 18:27
  • @BasileStarynkevitch Thanks, got the answer. There is no specific frequency as such, user fires one command, the process responsible for this command needs a service from another process. So it forks and execs the child, and child does a exec itself upon some condition. The frequency is definitely not more than one command (mentioned above) per minute. So I would assume it is perfectly harmless. – Bose Sep 25 '15 at 18:31
  • So it does not matter in practice – Basile Starynkevitch Sep 25 '15 at 18:32
  • Thanks a lot. Got clarifications now.. – Bose Sep 25 '15 at 18:33

2 Answers2

4

The second execv() call is no more expensive than the first. It might even be cheaper, since the system might not need to read the program image from disk, and should not need to load any new dynamic libraries.

On the other hand, execv() is considerably more expensive simply branching within the same program. I'm having trouble imagining a situation in which I would want to write a program that re-execs itself (without forking) instead of just calling a function.

On the third hand, "cheap" and "expensive" are relative. Unless you are doing this a lot, you probably won't actually notice any difference.

Gangadhar
  • 10,248
  • 3
  • 31
  • 50
John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 5
    Example for a program that exec-s itself: a PID=1 (= init process) that wants to restart itself after an update. And there are countless other useful applications for a self-exec. – datenwolf Sep 25 '15 at 15:27
  • 2
    @datenwolf, I cannot imagine a situation in which I would want to write `init` :-). Seriously, though, that's a reasonable scenario, albeit a rather specialized one. – John Bollinger Sep 25 '15 at 15:43
4

The execve syscall is a little bit expensive; it would be unreasonable to run it more than a few dozen -or perhaps a few hundreds- times per second (even if it probably lasts a few milliseconds, and perhaps a fraction of millisecond, most of the time).

It is probably faster (and cleaner) than the dozen of equivalent calls to mmap(2) (& munmap & mprotect(2)) and setcontext(3) you'll use to nearly mimic it (and then, there is the issue of killing the running threads outside of the one doing the execve, and other resources attached to a process, e.g. FD_CLOEXEC-ed file descriptors).

(you won't be able to replicate with mmap, munmap, setcontext, close exactly what execve is doing, but you might be close enough... but that would be ridiculous)

Also, the practical cost of execve should also take into amount the dynamic loading of the shared libraries (which should be loaded before running main, but technically after the execve syscall...) and their startup.

The question might not mean much, it heavily depends on the actual state of the machine and on the execveed executabe. I guess that execve a huge ELF binary (some executables might have a gigabyte of code segment, e.g. perhaps the mythical Google crawler is rumored to be a monolithic program with a billion of C++ source code lines and at some point it was statically linked), e.g. with hundreds of shared libraries is much longer than execve-in the usual /bin/sh.

I guess also that execve from a process with a terabyte sized address space is much longer than than the usual execve my zsh shell is doing on my desktop.

A typical reason to execve its own program (actually some updated version of it) is, inside a long lasting server, when the binary executable of the server has been updated.

Another reason to execve its own program is to have a more-or-less "stateless" server (some web server for static content) restart itself and reload its configuration files.

More generally, this is an entire research subject: read about dynamic software updating, application checkpointing, persistence, etc... See also the references here.

It is the same for dumping a core(5) file: in my life, I never saw a core dump lasting more that a fraction of a second, but I did hear than on early 1990-s Cray computers, a core dump could (pathologically) last half an hour.... So I imagine that some pathological execve could last quite a long time (e.g. bringing a terabyte of code segment, using C-O-W techniques, in RAM; this is not counted as execve time but it is part of the cost to start a program; and you also might have many relocations for many shared libraries.).

Addenda

For a small executable (less than a few megabytes), you might afford several hundreds execve per second, so that is not a big deal in practice. Notice that a shell script with usual commands like ls, mv, ... is execve-ing quite a lot (very often after some fork, which it does for nearly every command). If you suspect some issues, you could benchmark (e.g. with strace(1) using strace -tt -T -f....). On my desktop Debian/x86-64/Sid i7 3770K an execve of /bin/ls (by strace --T -f -tt zsh-static -c ls) takes about 250 µs (for an ELF binary executable /bin/ls of 118Kbytes which is probably already in the page cache), and for ocamlc (a binary of 1.8Mbyte) about 1.3ms ; a malloc usually takes half or a few µs ; a call to time(2) takes about 3ns (avoiding the overhead of a syscall thru vdso(7)...)

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547