1

I've read that scripts that are calling for a subshell are slow, which would explain why my script are slow.

for example here, where I'm running a loop that gets an number from an array, is this running a subshell everytime, and can this be solved without using subshells?

mmode=1
modes[1,2]="9,12,18,19,20,30,43,44,45,46,47,48,49"
until [[ -z $kik ]];do
    ((++mloop))
    kik=$(echo  ${modes[$mmode,2]} | cut -d "," -f $mloop)
    
    filename=$(basename "$f")
    # is all these lines
    
    xcolorall=$((xcolorall+$storednr)
    # also triggering
    
    pros2=$(echo "100/$totpix*$xcolorall" | bc -l) 
    IFS='.' read -r pros5 pros6 <<< "$pros2"
    procenthittotal2=$pros5.${pros6:0:2}
            
    #subshells and if,
    # is it possible to circumvent it?  
    #and more of the same code..
done

updated: the pros2 variable is calculating percent, how many % xcolorall are of totpix and the kik variable is getting a number from the array modes, informing the loop about what color it should count in this loop. I suspect these are the main hoggers, is there anyway to do this without subshells?

Socowi
  • 25,550
  • 3
  • 32
  • 54
  • That’s depending on your shell. It looks like this is bash, which will fork any $(...), whereas ksh93 won’t for instance. See eg https://unix.stackexchange.com/a/421028 – firmament Dec 16 '20 at 19:33
  • 1
    The basic point is that that `bash` at it's best is used to invoke other tools with appropriate inputs. If you describe your inputs better, maybe someone can come up with an approach that does not require this bash-looping solution. – liborm Dec 16 '20 at 19:35
  • I'm with liborm: I don't think the subshell is the problem, but the loop is. Loops in bash are slow, with or without subshells. Often there are ways to cope without loops. – Socowi Dec 16 '20 at 19:45
  • 1
    Note: `$(( ))` does not spawn a subshell; it's an arithmetic expression parsed directly in the main shell. On the other hand, `var=$(echo something | somecommand)` creates *three* subprocesses: a subshell to manage the pipe, another to do the `echo`, and another to run `somecommand`. `var=$(somecommand <<<"something")` only creates two (warning: `<<<` is a bash-only feature). – Gordon Davisson Dec 16 '20 at 19:53
  • @Socowi On my system, a 10,000 iteration loop takes 0.03 seconds without a subshell and 7.66 seconds with one. That's a 250x difference for a single fork, and this question has several (5-7). – that other guy Dec 16 '20 at 19:53
  • define `slow` (5 seconds? 4 minutes? 2.5 hours?) ... probably depends on what `#and more of the same code` consists of; the `$(...)` (single `(` and `)` bookends) are sub-shells; `filename=$(...)` could probably be replaced with some parameter expansions (will depend on format of `$f`); instead of a `until` loop I'd probably use `while/IFS/read` to directly parse the comma-delimited string, thus eliminating all the sub-process calls to repeatedly set `kik` – markp-fuso Dec 16 '20 at 19:55
  • the `filename=$(...) / xcolorall=$((...)) / pros2=$(...) / loc0=$((..)` assignments are being executed for each pass through the loop; but there's no indication (in the code snippet provided) that these assignments are changing for each pass through the loop; in the original code I'd make sure these types of assignments (2 of which are making sub-process calls) are moved up/before the loop if their values never change inside the loop – markp-fuso Dec 16 '20 at 20:10
  • Some things you need external tools for (like bc, bash can't do floating point math), but other things (like cut and basename) bash can do. You don't actually show the loop or the data you're looping over: add more details to your question. – glenn jackman Dec 16 '20 at 21:13
  • updated the questions with more information – Adam Larsson Dec 17 '20 at 06:48
  • @markp-fuso I'm pretty new to scripting, haven't IFS/read before.. – Adam Larsson Dec 17 '20 at 06:51
  • @Socowi is there any other ways to do loops or is it just to write the same function over and over again so it just process down the script, and then you could use some kind of condition to skip when all numbers have been processed? – Adam Larsson Dec 17 '20 at 06:54
  • @AdamLarsson It always depends on the task. Repeating the same function manually won't help and only bloats up your script. What I meant was something like replacing `for i in "$@"; do echo "$i"; done` by `printf %s\\n "$@"` to give a very simplified example. Here, I'm afraid actually need a loop. But if speed matters, implement that loop in another language. – Socowi Dec 17 '20 at 10:50
  • @Socowi so bash can execute other languages to? – Adam Larsson Dec 17 '20 at 11:06
  • 1
    Maybe take a step back and explain what you are actually trying to do. Are you counting pixels of a certain colour in an image? You may be using the wrong tool altogether. – Mark Setchell Dec 17 '20 at 11:09
  • @MarkSetchell I'm identifying colors on maps, and by those count out how much trees/water and stuff there is. using imagemagick to create a histogram and from there calculate it.. – Adam Larsson Dec 17 '20 at 11:20
  • If you update your question, or maybe start a new one and show your images and what you are trying to do, I think we can get you a MUCH BETTER solution than parsing textual output about images... – Mark Setchell Dec 17 '20 at 12:09
  • @MarkSetchell it's maps in 600x600px, and by using imagemagick I get I list of how many pixels of each, and then I compare the list against what colors I'm looking for at the moment, and those are specified in the modes array. – Adam Larsson Dec 17 '20 at 19:27
  • Ask a new question (they are free, same as answers) with your images attached and a proper description. – Mark Setchell Dec 17 '20 at 19:41
  • @MarkSetchell I will do some test before so I know more specifics about what I want to know.. and no, the questions are not "free" asking to much and not getting upvotes will block you from asking any more.. – Adam Larsson Dec 17 '20 at 19:46

1 Answers1

1

You can replace all the subshells and extern commands shown in your question with bash built-ins.

  • kik=$(echo ${modes[$mmode,2]} | cut -d "," -f $mloop) can be replaced by
    mapfile -d, -t -s$((mloop-1)) -n1 kik <<< "${modes[$mmode,2]}".
    If $mmode is constant here, better replace the whole loop with
    while IFS=, read -r kik; do ...; done <<< "${modes[$mmode,2]}".
  • filename=$(basename "$f") can be replaced by
    filename=${f##*/} which runs 100 times faster, see benchmark.
  • pros2=$(echo "100/$totpix*$xcolorall" | bc -l) can be replaced by
    (( pros2 = 100 * xcolorall / totpix )) if you don't care for the decimals, or by
    precision=2; (( pros = 10**precision * 100 * xcolorall / totpix )); printf -v pros "%0${precision}d" "$pros"; pros="${pros:0: -precision}.${pros: -precision}" if you want 2 decimal places.
    Of course you can leave out the last commands (for turning 12345 into 123.45) until you really need the decimal number.

But if speed really matters, write the script in another language. I think awk, perl, or python would be a good match here.

Socowi
  • 25,550
  • 3
  • 32
  • 54
  • I've just tested your solutions, and the first one lowered the runtime in a 40xloop from 8.9sec to 3.5sec, I'm amazed that subshells are so slow.. – Adam Larsson Dec 18 '20 at 08:43
  • Thank you for letting me know. Can you elaborate on the *"didn't work"*? With `totpix=1200; xcolorall=750` I get `62.50` which is even more precise than the `62.49999999999999999750` I get from `bc`. Could it be that `xcolorall` is a floating point number too? If so, then you have to apply the same trick here too. – Socowi Dec 18 '20 at 20:30
  • when I'm running with set -e, the '(( pros = 10**precision * 100 * xcolorall / totpix ))' command exits with error 1, and I'm not getting any results from it, even if I change pros to pros2 in your code. xcolorall is number of pixels, not floating.. I updated the question with my complete function to calculate the percentage. – Adam Larsson Dec 19 '20 at 08:02
  • can it be because xcolorall starts with the value 0 ?, since I the first loop xcolorall all haven't been assigned any pixels yet? – Adam Larsson Dec 19 '20 at 08:13
  • `set -e` is the only problem here. I never use it because of problems like this one. If the expression inside `(( expr ))` evaluates to `0` then the exit code is 1. This isn't an error, but a feature to write C style `if (( expr ))`, but with `set -e` the script exits. You can remove the `set -e` or (if you really want to keep it) append `|| true` after each `(( expr ))`. – Socowi Dec 19 '20 at 18:15
  • I've previously set +e just before it, and it didn't work. However, when testing your code, it works standalone, so something else is messing up it, I'll investigate.. – Adam Larsson Dec 19 '20 at 19:05
  • Just a small update, everything is working per your code, have no idea why it didn't work at first, now thx to your code, an image that took 11.7seconds now only takes 0.6 seconds! I never suspected that subshells where such hoggers.. – Adam Larsson Dec 20 '20 at 13:54
  • 1
    Thank you for letting me now. I'm glad I could help. – Socowi Dec 20 '20 at 16:07