The essential issue here is that when bash starts up a subshell, it just clones itself rather than executing a new shell from scratch. That means that the subshell is born with all the temporary data structures allocated in the parent shell.
Doing it this way is required for the subshell to inherit the current execution environment: shell functions and variables, and other shell settings. It is also generally more efficient, since it avoids the shell startup costs, which are considerable.
Unix copy-on-write (COW) semantics avoid some of the memory costs of duplicating all these data structures. But since COW works on complete pages, not individual allocations, it is not going to be able to completely avoid copies.
One simple thing you can do to reduce memory consumption is to change your for
loop to a computed for
, which looks a lot like a C for
with extra parentheses:
for ((i=0; i<5000; ++i)); do
Your for loop (for i in $(seq 5000); do
) has to start by expanding the output of seq 5000
into a string (of about 30kb) and then splitting it into 5000 words, each of which is a single allocation, as well as a 5000-element vector of pointers. Allocation overhead means that the cost of each word is going to be more than 40 bytes, even though each string is only 5 bytes long. Since these are individual allocations, they get scattered around a bit and other allocations will be made in the same VM pages, triggering COW.
Although these numbers seem small, you are multiplying everything by making N clones of shells with N word vectors, meaning that total memory consumption is quadratic in N. If you have 25 million words, that's going to add up to a lot even if each word only occupies a few bytes: at 40 bytes each, that's a gigabyte. And quadratic growth makes it increase rapidly.
When I tried the change to the for
statement, it saved (in total) about a third of the used memory.
That's a big win for little effort but it doesn't really address the underlying problem. The parent shell also needs to keep track of all the children it spawns, and it does that by keeping a little data about each child. That memory structure is modified each time a new child is spawned, so each new child is born with a different data structure. In this case, COW doesn't help at all, and the total memory consumption will be strictly quadratic.
Fixing that will depend on what you actually do inside the loop.
As suggested by Charles Duffy in a (now-deleted) comment, a simple fix is to simply remove the parallel task from the job table using the disown
command:
for ((i=0; i<5000; ++i)); do
(
echo $i;
sleep 100;
)&
disown
done;
On the other hand, if all you are doing is starting up an external command -- or even if that is the last thing you do and everything else is quite fast -- you could use exec
to replace the subshell memory image with the external command:
for ((i=0; i<5000; ++i)); do
(
echo $i;
exec sleep 100;
)&
done;
You could even do the exec
with a full script, but calling a less memory-intensive shell such as dash
.
Experimental results (total process size in kilobytes):
fix for fix for fix for
Only fix + disown + exec + exec
N Original for loop children sleep dash
4000 4655956 3148792 1601428 1233212 1265224
5000 6768896 4404432 2001428 1541460 1581540
6000 9241116 5837660 2401428 1849692 1897768
7000 12056056 7443052 2801428 2158752 2213992
8000 15235688 9220568 3201428 2466104 2530180
It's pretty clear that the first two columns are roughly quadratic in N, and the last three are linear.
I used the following helper to collect those statistics; you can see the precise loop in the various case clauses. For all tests, the number of processes whose size were summed was N+1 (so it includes the driver):
#!/bin/bash
case $1 in
o*)
printf "Original: " >> /dev/stderr
for i in $(seq $2); do ( echo $i; sleep 10; )& done
ps -osize=,cmd= | grep '[s]ubshell' | awk '{s+=$1}END{print NR, s}' 1>&2
sleep 15
;;
f*)
printf "Fix for loop: " >> /dev/stderr
for ((i = 0; i < $2; ++i)); do ( echo $i; sleep 10; )& done
ps -osize=,cmd= | grep '[s]ubshell' | awk '{s+=$1}END{print NR, s}' 1>&2
sleep 15
;;
d*)
printf "Also disown: " >> /dev/stderr
for ((i = 0; i < $2; ++i)); do ( echo $i; sleep 10; )& disown; done
ps -osize=,cmd= | grep '[s]ubshell' | awk '{s+=$1}END{print NR, s}' 1>&2
sleep 15
;;
e*)
printf "Exec external: " >> /dev/stderr
for ((i = 0; i < $2; ++i)); do ( echo $i; exec sleep 10; )& done
ps -p$$ -Csleep -osize= | awk '{s+=$1}END{print NR, s}' 1>&2
sleep 15
;;
a*)
printf "Exec dash: " >> /dev/stderr
for ((i = 0; i < $2; ++i)); do ( exec /bin/dash -c "echo $i; sleep 10"; )& done
ps -p$$ -Cdash -osize= | awk '{s+=$1}END{print NR, s}' 1>&2
sleep 15
;;
*)
echo "First argument should be original, forloop, disown, exec or ash."
;;
esac