2

I'm writing a script where I need to make sure a string contains a comma. If it doesn't I need the script to exit. Consider the below, where my intent is to only use builtins to enhance performance:

#!/bin/sh

check_for_commas='This string must contain a comma'

comma_found='0'
iterate_through_string="$check_for_commas"
while [ -n "$iterate_through_string" ]; do
    char="$(printf '%.1s' "$iterate_through_string")"

    if [ "$char" = ',' ]; then
        comma_found='1'
        break
    fi

    iterate_through_string="${iterate_through_string#?}"
done

if [ "$comma_found" != '1' ]; then
    echo 'Your string does not contain a comma. Exiting...'
    exit
else
    echo 'Found a comma in the string. Script can continue...'
fi

I am using command substitution in this script, which spawns a subshell for every single character it iterates through. Compare with this:

#!/bin/sh

check_for_commas='This string must contain a comma'

if [ "$(echo "$check_for_commas" | grep -q -F ','; echo "$?")" = '1' ]; then   
    echo 'Your string does not contain a comma. Exiting...'
    exit
else
    echo 'Found a comma in the string. Script can continue...'
fi

I clearly don't mind doing a little extra work to squeeze out extra performance. But I'm concerned that using so many subshells has defeated my whole initial intent.

Does my pursuit of only using builtins to enhance performance become useless when gratuitous use of subshells comes into the picture?

codeforester
  • 39,467
  • 16
  • 112
  • 140
Harold Fischer
  • 279
  • 1
  • 9
  • 1
    BTW, `if printf '%s\n' "$check_for_commas" | grep -q -F ,; then` would still be expensive in terms of performance, but it's a lot less unnecessary syntax than the command substitution used above. – Charles Duffy Apr 07 '18 at 02:42
  • 1
    BTW, if you care more about runtime performance than startup time, you might think about targeting ksh93 rather than /bin/sh. – Charles Duffy Apr 07 '18 at 02:44
  • 1
    @CharlesDuffy I feel really stupid now. Your one liners are amazing. I might be wasting my time with this POSIX stuff. I recently switched from bash to POSIX sh and trust me, I get why you would recommend a more full-featured shell – Harold Fischer Apr 07 '18 at 02:45
  • @HaroldFischer, Re "*wasting my time*": *POSIX* is more portable and often [significantly faster](https://unix.stackexchange.com/questions/148035/is-dash-or-some-other-shell-faster-than-bash). For some applications (embedded systems with low resources) little shells like `dash` can save much time. – agc Apr 07 '18 at 05:15
  • ...though there's certainly an argument that if one is targeting a low-resource embedded system, using a shell is the Wrong Thing altogether. If I were back in the tiny-systems end of embedded space, I'd be keeping an eye on https://gokrazy.org/ to have a single-binary userland. – Charles Duffy Apr 07 '18 at 13:10
  • You mentioned that if printf '%s\n' "$check_for_commas" | grep -q -F ,; then is still expensive in terms of performance, but is it *less* expensive than using command substitution? – Harold Fischer Apr 10 '18 at 20:35

2 Answers2

3

Command substitutions, as in $(printf ...), are indeed expensive -- and you don't need them for what you're doing here.

case $check_for_commas in
  *,*) echo "Found a comma in the string";;
  *)   echo "No commas present; exiting"; exit 1;;
esac

In the more general case -- a fork() alone costs less than a fork()/execve() pair, so it's cheaper to have a single subshell than a single external-command invocation; but if you're comparing a loop generating multiple subshells vs a single external-command invocation, which is cheaper depends on how many times your loop will iterate (and how expensive each of these things is on your operating system -- forks are traditionally extra expensive on Windows, for example), and is as such a fact-intensive investigation. :)

(Speaking to the originally proposed code -- note that ksh93 will optimize away the fork in the specific var=$(printf ...) case; by contrast, in bash, you need to use printf -v var ... to get the same effect).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 1
    I'm really starting to appreciate how powerful case really is. Is there a way this could be implemented to look for the *number* of commas? – Harold Fischer Apr 07 '18 at 22:53
  • The checks happen in order, so you can do something like the following: `case $check_for_commas in *,*,*,*) echo "Found at least three commas";; *,*,*) echo "Found two commas";; *,*) "Found one comma";; *) echo "Found no commas";; esac` – Charles Duffy Apr 07 '18 at 23:11
  • Gotcha. So case is probably not a good way to check for the number of occurrences of a substring in a large string. As a curiosity, does case break out when it finds the first comma, or does it continue processing the rest of the string? – Harold Fischer Apr 07 '18 at 23:30
  • Stops at the first match. Bash has an extension adding extra syntax for fallthrough, but it's not available in POSIX sh. – Charles Duffy Apr 08 '18 at 00:02
1

Here's a short POSIX shell function that uses a combined remove matching prefix pattern and remove matching suffix pattern, and test, (or rather [ which is the same thing), to return a true flag if there's a comma:

chkcomma(){ [ "${1#${1%,*}}" ] ; }

Example without comma:

chkcomma foo && echo comma found || echo no comma

Output:

no comma

Example with comma:

chkcomma foo,bar && echo comma found || echo no comma

Output:

comma found

This can be further abstracted to find substrings using globbing:

# Usage: instr globpattern string
# returns true if found, false if not.
instr(){ [ "${2#${2%${1}*}}" ] ; }

Example:

instr '[Mm]oo' mood && echo found

Output:

found
agc
  • 7,973
  • 2
  • 29
  • 50
  • Progress: that `instr` function improves upon an old *POSIX* shell function I used to use for the same effect which needed `case` and `eval`. *OTOH* the older function could do logical *OR* using the `|`, (like `case`), which this function can't do... – agc Apr 07 '18 at 05:23