5

I currently have a script that kicks off threads to perform various actions on several directories. A snippet of my script is:

#main
sub BuildInit {

    my $actionStr = "";
    my $compStr   = "";

    my @component_dirs;
    my @compToBeBuilt;
    foreach my $comp (@compList) {
        @component_dirs = GetDirs($comp);    #populates @component_dirs
    }

    print "Printing Action List: @actionList\n";

    #---------------------------------------
    #----   Setup Worker Threads  ----------
    for ( 1 .. NUM_WORKERS ) {
        async {
            while ( defined( my $job = $q->dequeue() ) ) {
                worker($job);
            }
        };
    }

    #-----------------------------------
    #----   Enqueue The Work  ----------
    for my $action (@actionList) {
        my $sem = Thread::Semaphore->new(0);
        $q->enqueue( [ $_, $action, $sem ] ) for @component_dirs;

        $sem->down( scalar @component_dirs );
        print "\n------>> Waiting for prior actions to finish up... <<------\n";
    }

    # Nothing more to do - notify the Queue that we're not adding anything else
    $q->end();
    $_->join() for threads->list();

    return 0;
}

#worker
sub worker {
    my ($job) = @_;
    my ( $component, $action, $sem ) = @$job;
    Build( $component, $action );
    $sem->up();
}

#builder method
sub Build {

    my ( $comp, $action ) = @_;
    my $cmd     = "$MAKE $MAKE_INVOCATION_PATH/$comp ";
    my $retCode = -1;

    given ($action) {
        when ("depend") { $cmd .= "$action >nul 2>&1" }    #suppress output
        when ("clean")  { $cmd .= $action }
        when ("build")  { $cmd .= 'l1' }
        when ("link")   { $cmd .= '' }                     #add nothing; default is to link
        default { die "Action: $action is unknown to me." }
    }

    print "\n\t\t*** Performing Action: \'$cmd\' on $comp ***" if $verbose;

    if ( $action eq "link" ) {

        # hack around potential race conditions -- will only be an issue during linking
        my $tries = 1;
        until ( $retCode == 0 or $tries == 0 ) {
            last if ( $retCode = system($cmd) ) == 2;      #compile error; stop trying
            $tries--;
        }
    }
    else {
        $retCode = system($cmd);
    }
    push( @retCodes, ( $retCode >> 8 ) );

    #testing
    if ( $retCode != 0 ) {
        print "\n\t\t*** ERROR IN $comp: $@ !! ***\n";
        print "\t\t*** Action: $cmd -->> Error Level: " . ( $retCode >> 8 ) . "\n";

        #exit(-1);
    }

    return $retCode;
}

The print statement I'd like to be thread-safe is: print "\n\t\t*** Performing Action: \'$cmd\' on $comp ***" if $verbose; Ideally, I would like to have this output, and then each component that is having the $action performed on it, would output in related chunks. However, this obviously doesn't work right now - the output is interleaved for the most part, with each thread spitting out it's own information.

E.g.,:

ComponentAFile1.cpp
ComponentAFile2.cpp
ComponentAFile3.cpp
ComponentBFile1.cpp
ComponentCFile1.cpp
ComponentBFile2.cpp
ComponentCFile2.cpp
ComponentCFile3.cpp
... etc.

I considered executing the system commands using backticks, and capturing all of the output in a big string or something, then output it all at once, when the thread terminates. But the issue with this is (a) it seems super inefficient, and (b) I need to capture stderr.

Can anyone see a way to keep my output for each thread separate?

clarification: My desired output would be:

ComponentAFile1.cpp
ComponentAFile2.cpp
ComponentAFile3.cpp
-------------------  #some separator
ComponentBFile1.cpp
ComponentBFile2.cpp
-------------------  #some separator
ComponentCFile1.cpp
ComponentCFile2.cpp
ComponentCFile3.cpp
... etc.
MrDuk
  • 16,578
  • 18
  • 74
  • 133

3 Answers3

5

To ensure your output isn't interrupted, access to STDOUT and STDERR must be mutually exclusive. That means that between the time a thread starts printing and finishes printing, no other thread can be allowed to print. This can be done using Thread::Semaphore[1].

Capturing the output and printing it all at once allows you to reduce the amount of time a thread holds a lock. If you don't do that, you'll effectively make your system single-threaded system as each thread attempts lock STDOUT and STDERR while one thread runs.

Other options include:

  1. Using a different output file for each thread.
  2. Prepending a job id to each line of output so the output can be sorted later.

In both of those cases, you only need to lock it for a very short time span.


  1. # Once
    my $mutex = Thread::Semaphore->new();  # Shared by all threads.
    
    
    # When you want to print.
    $mutex->down();
    print ...;
    STDOUT->flush();
    STDERR->flush();
    $mutex->up();
    

    or

    # Once
    my $mutex = Thread::Semaphore->new();  # Shared by all threads.
    STDOUT->autoflush();
    STDERR->autoflush();
    
    
    # When you want to print.
    $mutex->down();
    print ...;
    $mutex->up();
    
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Could use `$thr->tid()` for the second suggestion – Zaid Apr 14 '14 at 16:44
  • @Zaid, Job ID would be more useful, but yes, the thread id would do quite well too. – ikegami Apr 14 '14 at 16:46
  • This is super helpful, thanks. I've updated my original question with some clarification for what I'm looking to do - I think your answer hints at ways to do this though, specifically with prepending some identifier and then sorting that for output. If I've misunderstood, please let me know. – MrDuk Apr 14 '14 at 17:03
  • The clarification wasn't needed; I already understood correctly. – ikegami Apr 14 '14 at 17:21
  • IPC::Run3 (and IPC::Run) make it easy to capture STDOUT and STDERR. – ikegami Apr 14 '14 at 17:24
2

You can utilize the blocking behavior of $sem->down if it attempts to decrease the semaphore counter below zero, as mentioned in perldoc perlthrtut:

If down() attempts to decrement the counter below zero, it blocks until the counter is large enough.


So here's what one could do:

Initialize a semaphore with counter 1 that is shared across all threads

my $sem = Thread::Semaphore->new( 1 );

Pass a thread counter to worker and Build

for my $thr_counter ( 1 .. NUM_WORKERS ) {
    async {
        while ( defined( my $job = $q->dequeue() ) ) {
            worker( $job, $thr_counter );
        }
    };
}

sub worker {
   my ( $job, $counter ) = @_;

   Build( $component, $action, $counter );
}

Go ->down and ->up inside Build (and nowhere else)

sub Build {
    my ( $comp, $action, $counter ) = @_;

    ... # Execute all concurrently-executed code here

    $sem->down( 1 << ( $counter -1 ) );

    print "\n\t\t*** Performing Action: \'$cmd\' on $comp ***" if $verbose;

    # Execute all sequential 'chunks' here

    $sem->up( 1 << ( $counter - 1) );
}

By using the thread counter to left-shift the semaphore counter, it guarantees that the threads won't trample on one another:

+-----------+---+---+---+---+
| Thread    | 1 | 2 | 3 | 4 |
+-----------+---+---+---+---+
| Semaphore | 1 | 2 | 4 | 8 |
+-----------+---+---+---+---+
Zaid
  • 36,680
  • 16
  • 86
  • 155
  • The second `...` contains print statements and/or a call to `system`, so your `...` and `up` are in the wrong order, and it means only one thread will execute at a time. – ikegami Apr 14 '14 at 16:30
  • I don't see any reason to not perform the `system` call prior to the semaphore block. – Zaid Apr 14 '14 at 16:33
  • 2
    Because you could end up with `output from system from thread 1\noutput from system from thread 2\nmore output from system from thread 1\n`, which he's asking how to avoid. The output from his threads are interleaved, and he wants to stop this from happening. – ikegami Apr 14 '14 at 16:35
  • Point well taken. I was so focused on transmitting the bit-shifting idea that I forgot `system` shares `STDOUT`. Oh well – Zaid Apr 14 '14 at 16:43
2

I've approached this problem differently in the past, by creating an IO thread, and using that to serialise the file access.

E.g.

my $output_q = Thread::Queue -> new();

sub writer {
    open ( my $output_fh, ">", $output_filename );
    while ( my $line = $output_q -> dequeue() ) {
        print {$output_fh} $line; 
    }
    close ( $output_fh );
 }

And within threads, 'print' by:

$output_q -> enqueue ( "text_to_print\n"; );

Either with or without a wrapper - e.g. for timestamping statements if they're going to a log. (You probably want to timestamp when queued, rather than when actually printer).

Sobrique
  • 52,974
  • 7
  • 60
  • 101