0

I have a Bash function library and one function is proving problematic for testing. prunner is a function that is meant to provide some of the functionality of GNU Parallel, and avoid the scoping issues of trying to use other Bash functions in Perl. It supports setting a command to run against the list of arguments with -c, and setting the number of background jobs to run concurrently with -t.

In testing it, I have ended up with the following scenario:

  • prunner -c "gzip -fk" *.out - works as expected in test.bash and interactively.
  • find . -maxdepth 1 -name "*.out" | prunner -c echo -t 6 - does not work, seemingly ignoring -c echo.

Testing was performed on Ubuntu 16.04 with Bash 4.3 and on Mac OS X with Bash 4.4.

What appears to be happening with the latter in test.bash is that getopts is refusing to process -c, and thus prunner will try to directly execute the argument without the prefix command it was given. The strange part is that I am able to observe it accepting the -t option, so getopts is at least partially working. Bash debugging with set -x has not been able to shed any light on why this is happening for me.

Here is the function in question, lightly modified to use echo instead of log and quit so that it can be used separately from the rest of my library:

    prunner () {
      local PQUEUE=()
      while getopts ":c:t:" OPT ; do
        case ${OPT} in
          c) local PCMD="${OPTARG}" ;;
          t) local THREADS="${OPTARG}" ;;
          :) echo "ERROR: Option '-${OPTARG}' requires an argument." ;;
          *) echo "ERROR: Option '-${OPTARG}' is not defined." ;;
        esac
      done
      shift $(($OPTIND-1))
      for ARG in "$@" ; do
        PQUEUE+=("$ARG")
      done
      if [ ! -t 0 ] ; then
        while read -r LINE ; do
          PQUEUE+=("$LINE")
        done
      fi
      local QCOUNT="${#PQUEUE[@]}"
      local INDEX=0
      echo "Starting parallel execution of $QCOUNT jobs with ${THREADS:-8} threads using command prefix '$PCMD'."
      until [ ${#PQUEUE[@]} == 0 ] ; do
        if [ "$(jobs -rp | wc -l)" -lt "${THREADS:-8}" ] ; then
          echo "Starting command in parallel ($(($INDEX+1))/$QCOUNT): ${PCMD} ${PQUEUE[$INDEX]}"
          eval "${PCMD} ${PQUEUE[$INDEX]}" || true &
          unset PQUEUE[$INDEX]
          ((INDEX++)) || true
        fi
      done
      wait
      echo "Parallel execution finished for $QCOUNT jobs."
    }

Can anyone please help me to determine why -c options are not working correctly for prunner when lines are piped to stdin?

MrDrMcCoy
  • 351
  • 4
  • 18
  • How does it behave with `printf` replacing `echo`? – bishop Mar 22 '18 at 23:59
  • 3
    Thanks for replacing `log` with `echo` to make your function more independent. How about taking that idea further and removing everything else that's not required to reproduce the issue? (this is the M in [MCVE](https://stackoverflow.com/help/mcve)) – that other guy Mar 23 '18 at 00:00
  • Small note: don't use all caps for your variable names. Think of variables whose names are all caps as being reserved for system use, because that's the general convention. It makes less difference if you religiously declare variables local, but it is still best practice to avoid the use of names in all caps (except when referring to system environment variables, of course). – rici Mar 23 '18 at 00:32
  • Can you elaborate on why you do not use `parallel --embed` to embed GNU Parallel as a function in your shell script? – Ole Tange Mar 23 '18 at 06:55

2 Answers2

7

My guess is that you are executing the two commands in the same shell. In that case, in the second invocation, OPTIND will have the value 3 (which is where it got to on the first invocation) and that is where getopts will start scanning.

If you use getopts to parse arguments to a function (as opposed to a script), declare local OPTIND=1 to avoid invocations from interfering with each other.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 2
    That's totally what it was. Not sure how those variables got set, since I'm not calling `getopts` outside of any functions within the test script. Setting them locally in the function before calling `getopts` solves the issue. Thanks! – MrDrMcCoy Mar 23 '18 at 00:44
  • 3
    @MrDrMcCoy: All you have to do is call your function twice (or any other function which uses getopts). `OPTIND` is a **global** variable, with all that implies (unless declared local). It's a curiosity of bash variable scoping that you can get away with declaring it local like this; bash locals are rather like perl locals in this respect. – rici Mar 23 '18 at 00:48
  • @rici Why declare it as local? My understanding is that any invocation of getopts clobbers OPTIND anyway, so it should be safe to omit the local and write `OPTIND=1` right before calling getopts. This way it's also sh-compatible. I'm just checking in case I missed something, your answer works for me otherwise – cjfp Sep 05 '22 at 23:23
  • 1
    @cjfp: In case you call a function which uses getopts to parse its arguments from inside an option handler in another getopts loop. Which is basically the same reason you would make any variable local. If you want `sh` compatibility, you'd have to save OPTIND and restore it, which seems a bit out of scope for this [tag:bash]-tagged question. (That's essentially how bash handles local declarations, internally.) – rici Sep 06 '22 at 00:45
  • @rici: Oh right, of course. To summarize: local (bash) or save/restore (sh) are only necessary to prevent overwriting OPTIND with recursive getopts. If you don't do anything or call anything that modifies OPTIND in the getopts loop, then a plain OPTIND=1 before the loop is fine. – cjfp Sep 06 '22 at 04:08
  • 1
    @cjfp: it's more like "if whoever called you doesn't rely on the value of `OPTIND`, then you're free to clobber it." Like any other variable. IMHO, not declaring variables as local is sloppy programming, which will inevitably come back to haunt you someday. Note that many `getopts` clients rely on the value of `OPTIND` after the `getopts` loop is done, since it still tells you where the first positional argument is. Anyway, YMMV, but I'm sticking with `local`. – rici Sep 06 '22 at 07:19
  • @rici: Well, I agree, but I'm writing everything in sh and recursion-safe save/restore sounds... out of scope, like you said. Will ask a new question if I need to. Appreciate the clarifications. – cjfp Sep 06 '22 at 12:09
0

Perhaps you are already doing this, but make sure to pass the top-level shell parameters to your function. The function will receive the parameters via the call, for example:

xyz () {
    echo "First arg: ${1}"
    echo "Second arg: ${2}"
}
xyz "This is" "very simple"

In your example, you should always be calling the function with the standard args so that they can be processed in the method via getopts.

prunner "$@"

Note that pruner will not modify the standard args outside of the function.

CharlieH
  • 1,432
  • 2
  • 12
  • 19