Parallel subshells doing work and report status

Question

I am trying to do work in all subfolders in parallel and describe a status per folder once it is done in bash.

suppose I have a work function which can return a couple of statuses

#param #1 is the folder
# can return 1 on fail, 2 on sucess, 3 on nothing happend
work(){
cd $1
// some update thing
return 1, 2, 3
}

now I call this in my wrapper function

do_work(){

  while read -r folder; do
    tput cup "${row}" 20
    echo -n "${folder}"
    (
      ret=$(work "${folder}")
      tput cup "${row}" 0
      [[ $ret -eq 1 ]] && echo " \e[0;31mupdate failed      \uf00d\e[0m"
      [[ $ret -eq 2 ]] && echo " \e[0;32mupdated            \uf00c\e[0m"
      [[ $ret -eq 3 ]] && echo " \e[0;32malready up to date \uf00c\e[0m"
    ) &>/dev/null
    pids+=("${!}")

    ((++row))
  done < <(find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n" | sort)
  echo "waiting for pids ${pids[*]}"

  wait "${pids[@]}"
}

and what I want is, that it prints out all the folders per line, and updates them independently from each other in parallel and when they are done, I want that status to be written in that line.

However, I am unsure subshell is writing, which ones I need to capture how and so on. My attempt above is currently not writing correctly, and not in parallel. If I get it to work in parallel, I get those [1] <PID> things and [1] + 3156389 done ... messing up my screen. If I put the work itself in a subshell, I don't have anything to wait for. If I then collect the pids I dont get the response code to print out the text to show the status.

I did have a look at GNU Parallel but I think I cannot have that behaviour. (I think I could hack it that the finished jobs are printed, but I want all 'running' jobs are printed, and the finished ones get amended).

No, you can't do full screen text editing in a bash shell. Bash only writes where the cursor is. Plus, strings written to stdout are not serialized, so the various messages will be mixed with each other. To get that kind of display, you'd need to use an application. — Tim Roberts, May 06 '22 at 03:01
what do you mean by 'application'. I mean with `tput` I can write where ever I want, if i understood that correctly. — Joel, May 06 '22 at 03:03
What I meant is managing the subshells with something like a Python script. It can serialize the outputs through a central ?monitor" and use ncurses to display them cleverly. — Tim Roberts, May 06 '22 at 03:09
I wonder how that makes a difference, but in the order of 10s to 100s. — Joel, May 06 '22 at 05:35
for <10 jobs you may take advantage of tmux/screen which can split the terminal into multiple regions and run each job in there. :) — pynexj, May 06 '22 at 06:13
just curious ... are you looking to display 1 folder per line? if 'yes', what happens when the number of folders is greater than the number of lines in your console/terminal (eg, 200)? — markp-fuso, May 06 '22 at 13:51
re: those pesky job control messages ... a web search on `bash suppress background job messages` brings up several hits, eg, [this](https://stackoverflow.com/q/11097761), [this](https://superuser.com/q/305933) and [this](https://unix.stackexchange.com/q/26534) — markp-fuso, May 06 '22 at 13:58

markp-fuso · Answer 1 · 2022-10-11T14:56:54.180

Assumptions/undestandings:

a separate child process is spawned for each folder to be processed
the child process generates messages as work progresses
messages from child processes are to be displayed in the console in real time, with each child's latest message being displayed on a different line

The general idea is to setup a means of interprocess communications (IC) ... named pipe, normal file, queuing/messaging system, sockets (plenty of ideas available via a web search on bash interprocess communications); the children write to this system while the parent reads from the system and issues the appropriate tput commands.

One very simple example using a normal file:

> status.msgs                           # initialize our IC file

child_func () {
    # Usage: child_func <unique_id> <other> ... <args>

    local i

    for ((i=1;i<=10;i++))
    do
        sleep $1

        # each message should include the child's <unique_id> ($1 in this case);
        # parent/monitoring process uses this <unique_id> to control tput output

        echo "$1:message - $1.$i" >> status.msgs
    done
}

clear
( child_func 3 & )
( child_func 5 & )
( child_func 2 & )

while IFS=: read -r child msg
do
    tput cup $child 10
    echo "$msg"
done < <(tail -f status.msgs)

NOTES:

the (child_func 3 &) construct is one way to eliminate the OS message re: 'background process completed' from showing up in stdout (there may be other ways but I'm drawing a blank at the moment)
when using a file (normal, pipe) OP will want to look at a locking method (flock?) to insure messages from multiple children don't stomp each other
OP can get creative with the format of the messages printed to status.msgs in conjunction with parsing logic in the parent's while loop
assuming variable width messages OP may want to look at appending a tput el on the end of each printed message in order to 'erase' any characters leftover from a previous/longer message
exiting the loop could be as simple as keeping count of the number of child processes that send a message <id>:done, or keeping track of the number of children still running in the background, or ...

Running this at my command line generates 3 separate lines of output that are updated at various times (based on the sleep $1):

                          # no ouput to line #1
  message - 2.10          # messages change from 2.1 to 2.2 to ... to 2.10
  message - 3.10          # messages change from 3.1 to 3.2 to ... to 3.10
                          # no ouput to line #4
  message - 5.10          # messages change from 5.1 to 5.2 to ... to 5.10

NOTE: comments not actually displayed in console

Why do you need to fork (put into bg with a `&`) in a subshell `(... &)` instead of doing it directly in the parent shell `{ ...; } &` ? Are there any reasons/advantages? — imz -- Ivan Zakharyaschev, Oct 11 '22 at 14:49
@imz--IvanZakharyaschev added a brief explanation; when background jobs complete they tend to print a message to stdout; the construct you're asking about was the only way I could find to make sure the 'background job completed' message didn't show up in stdout; there are likely other/better ways to do this but I was drawing a blank at the time (and at the current moment, too); modify those 3 lines with comparable calls and see if there are differences in the final output — markp-fuso, Oct 11 '22 at 14:59

score 0 · Answer 2 · answered May 11 '22 at 11:14

0

Based on @markp-fuso's answer:

printer() {
    while IFS=$'\t' read -r child msg
    do
        tput cup $child 10
        echo "$child $msg"
    done
}

clear
parallel --lb --tagstring "{%}\t{}" work ::: folder1 folder2 folder3 | printer
echo

answered May 11 '22 at 11:14

Ole Tange

31,768
5
86
104

Ivan · Answer 3 · 2022-10-11T18:56:20.600

You can't control exit statuses like that. Try this instead, rework your work function to echo status:

work(){
    cd $1
    # some update thing &> /dev/null without output
    echo "${1}_$status" #status=1, 2, 3
}

And than set data collection from all folders like so:

data=$(
    while read -r folder; do
          work  "$folder" &
    done < <(find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n" | sort)
    wait
)

echo "$data"

Parallel subshells doing work and report status

3 Answers3