How can the logging of all child processes in Perl's fork be controlled?

Question

I want to control the logging of all child processes in the file.

Code Snippet (file1.pl):

my @sitesForScr = ("abc_10","def_5","ghi_16");
foreach my $siteToRunOn (@sitesRun) {
     my $jkpid;
     if ($jkpid = fork()) {
         $SIG{CHLD} = 'DEFAULT';
     }
     elsif (defined ($jkpid)) {
         &linkFunc ("$siteToRunOn");
         exit 0;
     }
 }


sub linkFunc {
    print "$_[0]\n";
    my @ert=split("_",$_[0]);
    print "Waiting on $_[0] for $ert[1] sec\n";
    sleep $ert[1];
    print "Done for $_[0]\n";
}

What I want is that first, the logging of the first child process completes, then the logging of the second child process starts and when it completes, then the logging of the next child process starts, and so on.

As per the above code, output inside file (fileoutput.txt) on running "perl file1.pl >> /pan/sedrt/fileoutput.txt" is:

abc_10
Waiting on abc_10 for 10 sec
def_5
Waiting on def_5 for 5 sec
ghi_16
Waiting on ghi_16 for 16 sec
Done for def_5
Done for abc_10
Done for ghi_16

Expected Output on running command "perl file1.pl >> /pan/sedrt/fileoutput.txt":

abc_10
Waiting on abc_10 for 10 sec
Done for abc_10
def_5
Waiting on def_5 for 5 sec
Done for abc_10
ghi_16
Waiting on ghi_16 for 16 sec
Done for ghi_16

How can this be done?

Thanks!

If you want your processes to execute one after the other, why use multiple processes at all? — Dada, May 11 '22 at 07:01
Or, do you want the processes to actually execute in parallel, but still have the output look like they were executed sequentially? — Dada, May 11 '22 at 07:01
Isn't the point of forking that you divide a task into smaller parts, which execute as fast as possible, in parallel rather than serial. Yet you want them to be serial. What is it you are trying to achieve? — TLP, May 11 '22 at 07:19
@Dada,yes, I want the processes to actually execute in parallel, but still have the output look like they were executed sequentially — PPP, May 11 '22 at 07:28
@TLP, I want the tasks to be parallel, just the logging to be serial — PPP, May 11 '22 at 07:31
@PPP If you include a timestamp, you can sort the log messages afterwards and restore serial order. Something like `2022-05-11_10:58:03 your message here`, and then `sort log.txt > log_sorted.txt` Or perhaps use a database. — TLP, May 11 '22 at 08:59
This seems a pretty good scenario for using threads over forks. You can use `Thread::Queue` to collate and serialise output. — Sobrique, May 11 '22 at 09:03

zdim · Answer 1 · 2022-05-27T16:27:15.010

4

If by "logging" you mean that they all print to console like in the given example, then you can't really have them decoupled^† since they all compete for a single resource (fd 1).

What you can do though, is to have each child assemble its log as it goes and in the end they all communicate them to the parent. Thus the integrity of those logs is preserved and the parent can then sort it out as needed.

Each process can write its log to a file, with a pre-determined name that the parent knows, or can pipe the name to the parent (if there is more to communicate anyway). Or, each can redirect its STDOUT to an in-memory variable, which it can then send over a pipe to the parent at the end.

So some communication management will be involved. Or, that can be done using a library -- for example, Parallel::ForkManager provides for easy communication from children back to the parent. And it makes the whole process easier as well.

^† Without communication between them that is, which would be extremely messy.

edited May 27 '22 at 16:27

answered May 11 '22 at 07:18

zdim

64,580
5
52
81

I tried using in-memory variable in the above code, but it's not working. Can you please tell what is getting wrong here? I tried my $var; sub linkFunc { open (my $fh, ">", \$var); print $fh "$_[0]\n"; my @ert=split("_",$_[0]); print $fh "Waiting on $_[0] for $ert[1] sec\n"; sleep $ert[1]; print $fh "Done for $_[0]\n"; } print "$var is value of var variable\n"; Do it work when using fork? – PPP May 12 '22 at 06:22
1

@PPP Looks OK, hard to tell what/why is "_not working_" ... a one-liner demo: `perl -wE'$p = fork // die $!; if ($p==0) { open my $fh, ">", \my $so; say $fh "--> in kid $$"; say $fh "--> bye"; say "child $$ done, got stdout:\n---\n$so---"; exit }; wait'` (I mean you can copy this and paste on the command line and hit enter. Or write a program in a separate file of course.) – zdim May 12 '22 at 06:39
@PPP Then you'd want to pipe that variable (`$so` in my demo) to the parent. This is easy if you run your forks using `Parallel::ForkManager` as there is a ready system for that. (But it's not so hard by using pipes by hand either.) – zdim May 12 '22 at 06:44
how can i pipe the variable to the parent? Can you please show via a little code? – PPP May 12 '22 at 07:38
@PPP Here is a full [example](https://stackoverflow.com/a/61095493/4653379), with an extra bit. (Basics of it [here](https://stackoverflow.com/a/47951060/4653379) for example.) The `IO::Select` gets involved to nicely manage multiple pipes. Or, it's easier with `Parallel::ForkManager` -- see [this](https://stackoverflow.com/a/41891334/4653379) and [this](https://stackoverflow.com/a/40434889/4653379) and [this](https://stackoverflow.com/a/62230405/4653379). There's a lot more out there but I don't have time to search (my posts I can find quickly). I'll add to answer if I catch more time – zdim May 12 '22 at 08:08
1

@PPP I see now that I left out one helpful bit in my first comment above. After a "filehandle" to a variable is open (not a true filehandle in fact), with `open my $fh, ">", \my $so;`, then you can make _that_ filehandle default by `my $old = select $fh;` (we keep the old one in case we need to restore it but normally we don't in a fork). Now `say "hi";` goes to that filehandle, so to `$so`. (Btw, did the links in the previous comment help? Anything else/more?) – zdim May 27 '22 at 16:32
thanks. Ikegami in the comments below his main comment mentioned the same, and it actually helped. From your previous comments, for me no link helped. But still thanks that you shared. – PPP Jun 07 '22 at 05:27

ikegami · Answer 2 · 2022-05-11T15:37:42.027

0

You want the work to be performed simultaneously.
You want the output to appear grouped.

Yet, the processes are printing before doing the work (sleeping) and after. So the first thing that needs to be changed is that all of a process's output needs to be delayed until no more output will be produced by that process.

Whether you store the output in a memory, store it in a file, or pipe it back to the parent is up to you. Example of the first:

sub linkFunc {
    my $out = "$_[0]\n";
    my @ert=split("_",$_[0]);
    $out .= "Waiting on $_[0] for $ert[1] sec\n";
    sleep $ert[1];
    $out .= "Done for $_[0]\n";
    print $out;
}

That's not quite enough, though. You need to ensure that the print is not interrupted by the other processes. You will need to a mutex.

my $lock = acquire_lock();  # To be provided.

print $out;
select()->flush();

release_lock( $lock );      # To be provided.

edited May 11 '22 at 15:37

answered May 11 '22 at 15:13

ikegami

367,544
15
269
518

To your first method, there are 100s of print statements, and adding .= to each won't be possible. To your second part, what is acquire_lock() and release_lock()? Can you give any reference? – PPP May 12 '22 at 06:26
1

It would be less invasive to create a file handle to a string (`open( my $fh_out, '>', \my $out );`) and make it the default file handle (`select($fh_out)`). /// As indicated, this is something you need to provide. There are multiple ways you can provide mutual exclusion. That's up to you. – ikegami May 12 '22 at 13:08

How can the logging of all child processes in Perl's fork be controlled?

2 Answers2