1

I have a big code that somewhere in the middle of it, there's a foreach loop filling up some dictionaries. Each iteration is independant and fills those dictionaries with disjoint keys.

I'm trying to turn the "foreach" loop in the middle to multithreaded in order to decrease time.

In the following example, $a1, $b1 are the pointers to dictionaries.

I tried "thread::shared" this way:

my $a1 = {};    
my $b1 = {};
my $c1 = {};
my $d1 = {};

# a lot of code using $a1 and $b1

share($a1);
share($b1);
share($c1);
share($d1);
my @threads;
foreach my $Inst ( sort keys %{ $a1->{ports} }) {
            push( @threads, threads->create('some_func', $Inst, $a1, $b1, $c1, $d1, $e ...)); 
}
for my $thr (@threads) {
          thr->join();
}

# all the other code

But I get an error of:

Invalid value for shared scalar at ...

Any ideas how to get the data-structures filled, but not that it would interfere with the code before and after the for-each loop?

urie
  • 361
  • 2
  • 14
  • Your question is obscure -- your declare `a1,b1,c1,d1` as shared and then you say that they filled on each iteration without specifying any code how it happening. If they filled on each iteration then it sounds like they should not be shared. And your sentence -- but not that it would interfere with the "#some code" section and not with the "#some other code" section" -- creates even more confusion. – Polar Bear Aug 22 '21 at 08:24
  • @PolarBear Thanks for your comment, I'll try to edit the question – urie Aug 22 '21 at 08:26
  • According to [the documentation](https://perldoc.perl.org/threads::shared), shared scalar references can only store reference to shared variables or shared data. For example if `$a1` references a dictionary `%h1`, then you need to share `%h1` too.. – Håkon Hægland Aug 22 '21 at 08:33
  • @HåkonHægland I saw that, but I don't want to go over the whole code and change everything to shared... it would take me forever. Any ideas how could I avoid that? – urie Aug 22 '21 at 08:34
  • 1
    @urie "_how could I avoid that_" --- Use `shared_clone` from `threads::shared`. I suggested this in comments to your two previous questions, one now deleted (exact same as this one, I believe). There was a link to a [post with an example](https://stackoverflow.com/a/46705209/4653379) as well. Did you not look at any of that at all? – zdim Aug 22 '21 at 08:51
  • 1
    @urie It seems it is not possible to make a hash shared after it has been created/declared without losing the data in the hash. If you do `shared %$a1` and `$a1` references a hash `%h1`, the data in `%h1` is lost. As zdim mentioned, I think you need to use `shared_clone` – Håkon Hægland Aug 22 '21 at 09:09
  • Please, post a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). That is, a _small_ piece of code that reproduces the error, so that we can run it and clearly understand what the issue is. – Dada Aug 22 '21 at 09:13
  • @zdim sorry if it seems that I ignored your answer from previous post. I looked at your answer with the shared_clone, but didn't understand that. I looked for another example, but couldn't find and the explanation I got from the official documentation wasn't satisfying. But now that I have an example from Hakon Haegland, I hope it would work! Thanks for your and all the other's help :) – urie Aug 22 '21 at 10:26

1 Answers1

1

It is not possible to make a hash shared after it has been created/declared without losing the data in the hash. Instead you could try use shared_clone() like this:

use feature qw(say);
use strict;
use warnings;
use threads ;
use threads::shared ;
use Data::Dumper qw(Dumper);

my %h1 = (a => 1, b => 2);
my %h2 = (c => 3, d => 4);

my $a1 = \%h1;
my $b1 = \%h2;

my $a1c = shared_clone($a1);
my $b1c = shared_clone($b1);
my $lockvar:shared;

my $nthreads = 3;
for ( 1..$nthreads ) {
    threads->create('job_to_parallelize', $a1c, $b1c, \$lockvar ) ;
}
$_->join() for threads->list();

sub job_to_parallelize {
    my ($a1, $b1, $lockvar) = @_;
    {
        lock $lockvar;
        $a1->{a}++;
        $b1->{d}++;
    }
}

print Dumper({a1c => $a1c});
print Dumper({b1c => $b1c});

Output:

$VAR1 = {
          'a1c' => {
                     'a' => 4,
                     'b' => 2
                   }
        };
$VAR1 = {
          'b1c' => {
                     'd' => 7,
                     'c' => 3
                   }
        };
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • How would you suggest combining everything together at the end? (In case of a complex dictionary, like dictionary of dictionaries of dictionaries) – urie Aug 22 '21 at 11:52
  • `shared_clone()` should be doing a deep copy. According to [the documentation](https://perldoc.perl.org/threads::shared) : *"returns a shared version of its argument, performing a deep copy on any non-shared elements"*. Did you try it? I do not see exactly what I can add to my answer that is not clear. Please clarify. – Håkon Hægland Aug 22 '21 at 12:06
  • I did that, and it made a deep copy :) My question is this: After one loop, I have a partly-filled data-structure. How do I update the "main" data-structure with the new values? Is there an easy way besides iterating through the whole keys/values and checking if there were any updates somewhere? If it's still unclear, I'll try to give a concrete example – urie Aug 22 '21 at 13:11
  • *"After one loop, I have a partly-filled data-structure."* : What do you mean by "one loop" ? In your example there is only the `thr->join()` loop. You mean that after that loop finishes, the data structure is only partly filled? Why is it so? Please give more details. – Håkon Hægland Aug 22 '21 at 14:13
  • I mean that EACH loop builds its own data structure(hash, in my case). How do I combine all of those hashes together? I just want to explain that it's not just hash => {3 =>4, "mom"=> "dad" }, but it's a very complex hash that holds hashes and arrays inside it. So just looping over the hash won't be enough. I'll try to open another question if I won't see any solution online. – urie Aug 24 '21 at 07:38
  • *"I mean that EACH loop builds its own data structure*" It is not clear to me what this means. By "each loop", do you mean "each thread" ? You are passing all the threads the same hash refs (`$a1`, `$b1`, `$c1`, ...), how is is each thread building its own data structure? Do you mean that the first thread modifies `$a1`, the second modifies `$b1`, and so on ? Or something else? – Håkon Hægland Aug 24 '21 at 07:45
  • You're right, my bad. Each Thread (which "opens up" every iteration of the foreach-loop) is building its own hash. How do I combine them all together EASILY and not by going every key in the dictionary? – urie Aug 24 '21 at 08:12
  • *"is building its own hash"* : So each thread is modifying the input hashes (`$a1`, `$b1`,...) etc. Then when the loop is finished the result is a combination of the modifications made by each thread. So you already have a combined solution. Please clarify if I am missing something. – Håkon Hægland Aug 24 '21 at 08:25
  • Thanks for your help. I posted a continuous question here: https://stackoverflow.com/questions/68907848/perl-how-can-i-edit-the-structure-of-a-shared-clone – urie Aug 24 '21 at 12:46