Perl - updating and merging data after merging threads

Question

Following this question and other questions I asked, I got some suggestions.

tl;dr:

I'm trying to run a "foreach" loop asynchronously. Each iteration updates a few hashes independently. The problem is that the memory is kept with each thread and I didn't know how to unite it all together.

I got a few suggestions, but I had problem with almost each:

When tried the thread/fork, there was a problem with shared memory that I needed to update everything to be shared and you're allowed to assign only shared values to those hashes and it made a big mess... (If there's a way to share everything, even variables that would later be defined. that might be a solution)
When trying to write all the hashes to files (by json), all the blessing is gone and I need to bless everything from the top which is a big mess too...

Any ideas how can I do it easier/faster?

It would be easier to give help if you provided some code examples. See [mcve] for more information — Håkon Hægland, Sep 01 '21 at 07:59
"_bless_" ? What is blessed? Are those objects that you create in threads, and would like to return to the main thread? Is this work IO-bound (involve the filesystem a lot) or not? -- Are your threads working mainly with files or "just" computing away? Should really tell us more... the question is broad anyway. — zdim, Sep 01 '21 at 17:34

zdim · Answer 1 · 2021-09-07T05:40:49.260

In some problems a common data structure must indeed be shared between different threads.

When this isn't necessary things are greatly simplified by simply returning from each thread, when join-ing, a reference to a data structure built in the thread during the run. The main thread can then process those results (or merge them first if needed). Here is a simple demo

use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);  # or use core Data::Dumper

use threads;

# Start threads.  Like threads->create(...)
my @thr = map { async { proc_thr($_) } } 1..3;

# Wait for threads to complete. If they return, that happens here
my @res = map { $_->join } @thr;

# Process results (just print in this case)
dd $_ for @res;
   
sub proc_thr { 
    my ($num) = @_; 

    # A convoluted example, to return a complex data structure
    my %ds = map { 'k'.$_ => [$_*10 .. $_*10 + 2] } 10*$num .. 10*$num+2;

    return \%ds;
}

This prints

{ k10 => [100, 101, 102], k11 => [110, 111, 112], k12 => [120, 121, 122] }
{ k20 => [200, 201, 202], k21 => [210, 211, 212], k22 => [220, 221, 222] }
{ k30 => [300, 301, 302], k31 => [310, 311, 312], k32 => [320, 321, 322] }

Now manipulate these returned data structures as suitable; work with them as they stand or merge them. I can't discuss that because we aren't told what kind of data need be passed around. This roughly provides for what was asked for, as far as I can tell.

Important notes

Lots of threads? Large data structures to merge? Then this may not be a good way
The word "bless" was mentioned, tantalizingly. If what you'd pass around are objects then they need be serialized for that, in such a way that the main thread can reconstruct the object.

Or, pass the object's data, either as a reference (as above), or by serializing it and passing the string; then the main thread can populate its own object from that data.

Returning (join-ing) an object itself (so a reference like above, an object being a reference) doesn't seem to fully "protect your rights;" I find that at least some operator overloading is lost (even as all methods seem to work and data is accessible and workable).

This is a whole other question, of passing objects around.^†
If the work to be done is I/O-bound (lots of work with the filesystem) then this whole approach (with threads) need be carefully reconsidered. It may even slow it down

Altogether -- we need more of a description, and way more detail.

^† This has been addressed on Stackoverflow. A couple of directly related pages that readily come to mind for me are here and here.

In short, objects can be serialized and restored using for example Storable. Pure JSON cannot do objects, while extensions can. On the other hand, pure JSON is excellent for serializing data in an object, which can then be used on the other end to populate an identical object of that class.

Hi, sorry for not commenting, we had holidays here :) I've decided to move forward with another way (YAML and back to merging blessed hashes). The main reason is the first and second drawback you entered. — urie, Sep 09 '21 at 08:10

Perl - updating and merging data after merging threads

1 Answers1