7

I'm trying to parallelize the processing of a file set using bash. I'm using named pipes for keeping number of process fixed and to gather output from the processes.

I'm assuming that the writes to named pipe are atomic, i.e the output of different process is not mixed up. Is that a safe assumption?

Any advice is greatly appreciated. I'm limited to using bash.

Here's the code:

mytask()
{
  wItem=$1
  #dummy func; process workItem
  rt=$RANDOM
  st=$rt;
  let "rt %= 2"
  let "st %= 10"
  sleep $st
  return $rt
}

parallelizeTask()
{
workList=$1
threadCnt=$2
task=$3
threadSyncPipeD=$4
outputSyncPipeD=$5

ti=0
for workItem in $workList; do
  if [ $ti -lt $threadCnt ]; then
    { $task $workItem; if [ $? == 0 ]; then result="success"; else result="failure"; fi; \
      echo "$result:$workItem" >&$outputSyncPipeD; echo "$result:$workItem" >&$threadSyncPipeD; } &
    ((ti++))
    continue;
  fi
  while read n; do
      ((ti--))
      break;
  done <&$threadSyncPipeD
  { $task $workItem; if [ $? == 0 ]; then result="success"; else result="failure"; fi; \
    echo "$result:$workItem" >&$outputSyncPipeD; echo "$result:$workItem" >&$threadSyncPipeD;} &
  ((i++))
done
wait
echo "quit" >&$outputSyncPipeD

while read n; do
 if [[ $n == "quit" ]]; then
    break;
 else
    eval $6="\${$6}\ \$n"
 fi
 done <&$outputSyncPipeD;
}

main()
{
  if [ ! -p threadSyncPipe ]; then
     mkfifo threadSyncPipe
   fi

   if [ ! -p outputSyncPipe ]; then
      mkfifo outputSyncPipe
   fi

   exec 4<>threadSyncPipe
   exec 3<>outputSyncPipe
   gout=
   parallelizeTask "f1 f2 f3 f4 f5 f6" 2 mytask 3 4 gout

   echo "finalOutput: $gout";
   for f in $gout; do
       echo $f
   done

   rm outputSyncPipe
   rm threadSyncPipe
}

main

I found below related post with answer to my question. I have revised the title to make it more appropriate.

Are there repercussions to having many processes write to a single reader on a named pipe in posix?

Community
  • 1
  • 1
neon
  • 462
  • 6
  • 17
  • 2
    What are you trying to do in plain English ? Your script seems like a complicated way to do something very similar to ` echo 'f1\nf2\nf3\nf4\nf5\nf6' | wargs -n1 -P2 mytask > gout ` . Or I may be totally mistaken. – damienfrancois Oct 24 '13 at 20:58
  • A note on terminology: `bash` can start multiple processes, not threads. A thread is simply an execution path that shares its memory with other threads within a single process. – chepner Oct 24 '13 at 21:02
  • You test for and create `threadSyncPipe` twice; I suspect the second time you mean `outputSyncPipe`. – chepner Oct 24 '13 at 21:03
  • that's right @chepner, corrected the typo. – neon Oct 24 '13 at 21:12
  • 1
    Parallel or xargs can also help you doing this. – Gcmalloc Oct 24 '13 at 21:15
  • @damienfrancois, wargs you mean xargs. looks like I'm re-inventing the wheel. With xargs can we guarantee that the output of multiple instances of mytask will not be mixed up in gout file. eg. mytask1 instance outputs "foo\n" and mytaks2 instance outputs "gear\n". Is there possibility of getting the two lines mixed up. "fgeooar\n\n" – neon Oct 24 '13 at 21:20
  • @neon oops yes sorry for the typo. xargs will not mess up the output. You will get either "foo\ngear\n" or "gear\nfoo\n". – damienfrancois Oct 24 '13 at 21:26
  • @damienfrancois Good to know that. I'm curious to know how xargs guarantees this. As learning exercise, would like to know if there any approach to achieve same kind of functionality with my script. – neon Oct 24 '13 at 21:37
  • my system is sparc 5.10 , it doesn't have parallel and the xargs present doesn't support -P option. As I don't have privileges to install or upgrade, I might have to go with my own implementation. – neon Oct 24 '13 at 22:31
  • 2
    I found answer in a related post, according to it, the writes to fifo are atomic as long as the write messages is less than the page size 4k. Thank you all for the replies and suggestions. http://stackoverflow.com/questions/587727/are-there-repercussions-to-having-many-processes-write-to-a-single-reader-on-a-n – neon Oct 25 '13 at 15:35
  • 1
    I agree with the answer you found (with the minor caveat that the size is a system configuration parameter, but 4 KiB is about the normal size). Please add it as a self-answer to the question, then accept it, so there is closure on the question. – Jonathan Leffler Dec 30 '13 at 05:36

1 Answers1

2

I found answer in the below given related post, according to it, the writes to fifo are atomic as long as the write messages is less than the page size 4k(page size depends on system configuration).

Are there repercussions to having many processes write to a single reader on a named pipe in posix?

Thank you all for the replies and suggestions.

Community
  • 1
  • 1
neon
  • 462
  • 6
  • 17