0

I have a script where I parallelize job execution while monitoring the progress. I do this using xargs and a named fifo pipe. My problem is that I while xargs performs well, some lines written to the pipe are lost. Any idea what the problem is?

For example the following script (basically my script with dummy data) will produce the following output and hangs at the end waiting for those missing lines:

$ bash test2.sh 
Progress: 0 of 99
DEBUG: Processed data 0 in separate process
Progress: 1 of 99
DEBUG: Processed data 1 in separate process
Progress: 2 of 99
DEBUG: Processed data 2 in separate process
Progress: 3 of 99
DEBUG: Processed data 3 in separate process
Progress: 4 of 99
DEBUG: Processed data 4 in separate process
Progress: 5 of 99
DEBUG: Processed data 5 in separate process
DEBUG: Processed data 6 in separate process
DEBUG: Processed data 7 in separate process
DEBUG: Processed data 8 in separate process
Progress: 6 of 99
DEBUG: Processed data 9 in separate process
Progress: 7 of 99
##### Script is hanging here (Could happen for any line) #####
#!/bin/bash
clear

printStateInLoop() {
  local pipe="$1"
  local total="$2"
  local finished=0

  echo "Progress: $finished of $total"
  while true; do
    if [ $finished -ge $total ]; then
      break
    fi

    let finished++
    read line <"$pipe"
      # In final script I would need to do more than just logging
    echo "Progress: $finished of $total"
  done
}

processData() {
  local number=$1
  local pipe=$2

  sleep 1 # Work needs time
  echo "$number" >"$pipe"
  echo "DEBUG: Processed data $number in separate process"
}
export -f processData

process() {
  TMP_DIR=$(mktemp -d)
  PROGRESS_PIPE="$TMP_DIR/progress-pipe"
  mkfifo "$PROGRESS_PIPE"

  DATA_VECTOR=($(seq 0 1 99)) # A bunch of data
  printf '%s\0' "${DATA_VECTOR[@]}" | xargs -0 --max-args=1 --max-procs=5 -I {} bash -c "processData \$@ \"$PROGRESS_PIPE\"" _ {} &

  printStateInLoop "$PROGRESS_PIPE" ${#DATA_VECTOR[@]}
}

process
rm -Rf "$TMP_DIR"

In another post I got the suggestion to switch to while read line; do … done < "$pipe" (function below) instead of while true; do … read line < "$pipe" … done to not close the pipeline on every line read. This reduces the frequency of the problem but still it happens: Some Lines are missing and sometimes a xargs: bash: terminated by signal 13.

printStateInLoop() {
  local pipe="$1"
  local total="$2"
  local finished=0

  echo "Progress: $finished of $total"
  while [ $finished -lt $total ]; do
    while read line; do
      let finished++
      # In final script I would need to do more than just logging
      echo "Progress: $finished of $total"
    done <"$pipe"
  done
}

A lot of people on SO suggested to use parallel or pv for doing this. Sadly those tools aren't available on the very limited target platform. Instead my script is based on xargs.

Sebastian Barth
  • 4,079
  • 7
  • 40
  • 59
  • 1
    Have you considered having each writer obtain a lock on the pipe before writing to the pipe? a google search on `bash linux flock write pipe` brings up several hints including a promising looking answer to [FIFO with single READER and multiple WRITERS in BASH](https://stackoverflow.com/q/64486022) – markp-fuso Nov 08 '20 at 21:29
  • 1
    There is no interlocking on writes to the pipe. Quite possible events will get overwritten. Maybe you need to interlock the processes (flock?). Or write each to a separate pipe and gather the results – Dale Nov 08 '20 at 21:29
  • Thank you. That indeed solved the problem. I have already searched for a file lock mechanism but obviously did not use the right search terms. – Sebastian Barth Nov 08 '20 at 21:51

1 Answers1

0

The solution (as pointed out by @markp-fuso and @Dale) was to create a file lock.

Instead of:

echo "$number" >"$pipe"

I now use flock to create/wait for a lock first:

flock "$pipe.lock" echo "$number" >"$pipe"
Sebastian Barth
  • 4,079
  • 7
  • 40
  • 59