2

Bash has the command substitution syntax $(f), which allows to capture the STDOUT of a command f. If the command is an executable, this is fine – the creation of a new process is necessary anyway. But if the command is a shell-function, using this syntax creates an overhead of about 25ms for each subshell on my system. This is enough to add up to noticable delays when used in inner loops, especially in interactive contexts such as command completions or $PS1.

A common optimization is to use global variables instead [1] for returning values, but it comes at a cost to readability: The intent becomes less clear, and output capturing suddenly is inconsistent between shell functions and executables. I am adding a comparison of options and their weaknesses below.

In order to get a consistent, reliable syntax, I was wondering if bash has any feature that allows to capture shell-function and executable output alike, while avoiding subshells for shell-functions.

Ideally, a solution would also contain a more efficient alternative to executing multiple commands in a subshell, which allows more cleanly isolating concerns, e.g.

person=$(
    db_handler=$(database_connect)    # avoids leaking the variable
    query $db_handler lastname        #   outside it's required
    echo ", "                         #   scope.
    query $db_handler firstname
    database_close $db_handler
)

Such a construct allows the reader of the code to ignore everything inside $(), if the details of how $person is formatted aren't interesting to them.


Comparison of Options

1. With command substitution

person="$(get lastname), $(get firstname)"

Slow, but readable and consistent: It doesn't matter to the reader at first glance whether get is a shell function or an executable.

2. With same global variable for all functions

get lastname
person="$R, "
get firstname
person+="$R"

Obscures what $person is supposed to contain. Alternatively,

get lastname
local lastname="$R"
get firstname
local firstname="$R"
person="$lastname, $firstname"

but that's very verbose.

3. With different global variable for each function

get_lastname
get_firstname
person="$lastname $firstname"
  • More readable assignment, but
  • If some function is invoked twice, we're back to (2).
  • The side-effect of setting the variable is not obvious.
  • It is easy to use the wrong variable by accident.

4. With global variable, whose name is passed as argument

get LN lastname
get FN firstname
person="$LN, $FN"
  • More readable, allows multiple return values easily.
  • Still inconsistent with capturing output from executables.
  • Note: Assignment to dynamic variable names should be done with declare rather than eval:

    $VARNAME="$LOCALVALUE"            # doesn't work.
    declare -g "$VARNAME=$LOCALVALUE" # will work.
    eval "$VARNAME='$LOCALVALUE'"     #  doesn't work for *arbitrary* values.
    eval "$VARNAME=$(printf %q "$LOCALVALUE")"
                                      # doesn't avoid a subshell afterall.
    

[1] http://rus.har.mn/blog/2010-07-05/subshells/

kdb
  • 4,098
  • 26
  • 49
  • 1
    About the last snippet: `eval "$VARNAME=\$LOCALVALUE";` should do what you want (it might be a good idea to use lower-case names unless you're dealing with `export`ed variables). – Petr Skocik Sep 05 '19 at 11:51
  • With a sufficiently modern Bash shell, you can use a global associative array, and assign the return value to the key matching the exact function name. Now one can consider globals an evil practice, especially if you want to run jobs or co-routines. People have done lots of twisted stuff with Bash, including objects and classes. In the end, if Bash or shell is not fast enough or does not allow to clearly code your task, then it may just be the wrong language for the job. – Léa Gris Sep 05 '19 at 12:32
  • 1
    `printf -v` is even safer than `declare`. – chepner Sep 05 '19 at 12:48
  • @LéaGris I was thinking mostly of things like the `$PS1` prompt, but thinking of it again, it's perfectly doable to delegate even that to a python script or compiled executable... – kdb Sep 05 '19 at 13:43
  • `be done with declare` - yes, but with `declare -n`, not `declare -g`. – KamilCuk Sep 05 '19 at 20:49

2 Answers2

2

If you want it to be efficient the shell functions can't return their result via stdout. If they did, there'd be no way to get it but by running the function in a subshell and capturing the output via an internal pipe, and these operations are kind of expensive (a few ms on a modern system).

When I was focusing on shell scripts and I needed to max their performance I used a convention where function foo would return its result via a variable foo. This you can do even in a POSIX shell and it has the nice property that it won't overwrite your locals because if foo is a function, you've already kind of reserved the name.

Then I had this bx_r getter function that runs a shell function and saves its output into either a variable whose name is given by the first argument or it outputs the output to stdout if the first argument is a word that's an illegal variable name (without a newline if the word is exactly an empty word, i.e., '').

I've modified it so it can be used uniformly with either commands or functions.

You can't use the type builtin to differentiate between the two here because type returns its result via stdout => you'd need to capture that result and that would impose the forking penalty again.

So what I do when I'm about to run function foo is I check if there's a corresponding variable foo (this can catch a local variable but you'll avoid the chances of this if you limit yourself to properly namespaced shell function names). If there is, I assume that's where function foo returns its result, otherwise I run it in a $(), capturing its stdout.

Here's the code with some testing code:

bx_varlike_eh()
{
    case $1 in
        ([!A-Za-z_0-9]*) false;;
        (*) true;;
    esac
}
bx_r() #{{{ Varname=$1; shift; Invoke $@ and save it to $Varname if a legal varname or print it
{
    # `bx_r '' some_command` prints without a newline
    # `bx_r - some_command` (or any non-variable-character-containing word instead of -) 
    #           prints with a newline

    local bx_r__varname="$1"; shift 1
    local bx_r
    if ! bx_varlike_eh "$1" || eval "[ \"\${$1+set}\" != set ]"; then
        #https://unix.stackexchange.com/a/465715/23692
        bx_r=$( "$@" ) || return #$1 not varlike or unset => must be a regular command, so capture
    else
        #if $1 is a variable name, assume $1 is a function that saves its output there
        "$@" || return
        eval "bx_r=\$$1" #put it in bx_r
    fi
    case "$bx_r__varname" in
        ('') printf '%s' "$bx_r";;
        ([!A-Za-z_0-9]*) printf '%s\n' "$bx_r";;
        (*) eval "$bx_r__varname=\$bx_r";;
    esac
} #}}}

#TEST
for sh in sh bash; do
    time $sh -c '
    . ./bx_r.sh
    bx_getnext=; bx_getnext() { bx_getnext=$((bx_getnext+1)); }
    bx_r - bx_getnext
    bx_r - bx_getnext
    i=0; while [ $i -lt 10000 ]; do
        bx_r ans bx_getnext
        i=$((i+1)); done; echo ans=$ans
    '
    echo ====

    $sh -c '
    . ./bx_r.sh
    bx_r - date
    bx_r - /bin/date
    bx_r ans /bin/date
    echo ans=$ans
    '
    echo ====
    time $sh -c '
    . ./bx_r.sh
    bx_echoget() { echo 42; }
    i=0; while [ $i -lt 10000 ]; do 
        ans=$(bx_echoget)
        i=$((i+1)); done; echo ans=$ans 
    '
done
exit

#MY TEST OUTPUT

1
2
ans=10002
0.14user 0.00system 0:00.14elapsed 99%CPU (0avgtext+0avgdata 1644maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
====
Thu Sep  5 17:12:01 CEST 2019
Thu Sep  5 17:12:01 CEST 2019
ans=Thu Sep 5 17:12:01 CEST 2019
====
ans=42
1.95user 1.14system 0:02.81elapsed 110%CPU (0avgtext+0avgdata 1656maxresident)k
0inputs+1256outputs (0major+350075minor)pagefaults 0swaps
1
2
ans=10002
0.92user 0.03system 0:00.96elapsed 99%CPU (0avgtext+0avgdata 3284maxresident)k
0inputs+0outputs (0major+159minor)pagefaults 0swaps
====
Thu Sep  5 17:12:05 CEST 2019
Thu Sep  5 17:12:05 CEST 2019
ans=Thu Sep 5 17:12:05 CEST 2019
====
ans=42
5.20user 2.40system 0:06.96elapsed 109%CPU (0avgtext+0avgdata 3220maxresident)k
0inputs+1248outputs (0major+949297minor)pagefaults 0swaps

As you can see, you can get uniform call syntax with this, while speeding up the execution of small shell functions by up to about 14 times due to eliminating the need for captures ($()).

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • The core of your solution is like my variant (3), but the bx_r function with your suggested extension would actually make it consistent. – kdb Sep 05 '19 at 13:55
  • @kdb I looked at it proper and edited the answer. The original bx_r did caching too. This one skips the caching but it adds the ability to handle commands too. – Petr Skocik Sep 05 '19 at 15:16
  • 1
    @kdb The way this bx_r tells if the command is a function returning through a variable is it checks if the corresponding variable exists (can't use the `type` builtin as that one outputs to stdout again). I also measured the speed up. Up to about 14 times with dash and 5 with bash. – Petr Skocik Sep 05 '19 at 15:19
  • I also frequently use the $() pattern for better isolation of scopes, see e.g. the "database_connect" example I just added to the question. Do you by any chance also know a solution how to improve the performance of such constructs, without losing the clean separation and code structure? Sadly, such a facility isn't even present in python... At least not in a standardized manner. – kdb Sep 05 '19 at 20:44
  • @kdb You can replace the $() with a shell function and use `local` variables inside it if it's performance critical. `local` is widely supported (even though it isn't POSIX, most shells have it). – Petr Skocik Sep 05 '19 at 20:49
2

Use a bash nameref.

With bash v4 you can use variable namerefs:

get() {
   declare -n _get__res
   _get_res="$1"
   case "$2" in
   firstname) _get_res="Kamil"; ;;
   lastname) _get_res="Cuk"; ;;
   esac
}

get LN lastname
get FN firstname
person="$LN, $FN"

Namerefs can still clash with variables from outer scope. Use long names for the namerefs, like here I used underscore, function name, two underscores and then variable name.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Thanks for the tip! Clashes don't seem to actually be a problem though. https://pastebin.com/Ydgs2aZX – kdb Sep 06 '19 at 08:58