Consider the following scenario:
We want to take a large distributed collection of objects, and for each object in the collection we want to kick of another computation which uses the current object and another large distributed collection to compute a result that transforms the current object.
E.g.
collection A: 1,2,3,4,5,6,7,8......
collection B: 1,2,3,4,5,6,7,8......
For each value in A, we iterate all the values in B, multiplying each by 2 and summing these values, we map each value in A to this sum multiplied by the current A value.
Below is my attempt which results in a deadlock when the following is used:
c2.newJob(p2).join()
there is no deadlock when the following is used:
c2.newJob(p2)
, however we want p2 to complete to ensure we get the correct sum.
This might seem like a non-idiomatic way of using Jet for this specific use case however I want to use this pattern to solve other problems and so I would greatly appreciate your help with this.
JetInstance jet = Jet.newJetInstance();
JetInstance c1 = Jet.newJetClient();
Pipeline p1 = Pipeline.create();
List<Integer> aIn = jet.getList("a-in");
aIn.add(1);
aIn.add(2);
aIn.add(3);
p1.drawFrom(Sources.list("a-in"))
.map(e -> {
Pipeline p2 = Pipeline.create();
JetInstance c2 = Jet.newJetClient();
List<Integer> bIn = c2.getList("b-in");
bIn.add(1);
bIn.add(2);
bIn.add(3);
p2.drawFrom(Sources.list("b-in"))
.map(i->((Integer)i)*2)
.drainTo(Sinks.list("b-out"));
List<Integer> bOut = c2.getList("b-out");
// I would have thought it should just wait for the computation to complete,
// instead the join here causes jet to block itself,
c2.newJob(p2).join();
int sum = 0;
for (Integer i : bOut){
sum+=i;
}
return ((Integer)e)*sum;
}).drainTo(Sinks.list("a-out"));
c1.newJob(p1).join();