I want to implement a process where I load 2 Kinds of data, lets say Kind A and B, PCollection<A> a1, PCollection<B> b1
. Then I create a View.asMap() from a1
and give it to a DoFn dfn1
as sideinput that is applied on b1
. This DoFn uses some of the values of Kind A and outputs them. Afterwards, I want to create a new PCollection<A> a2
that holds all the objects of a1
, but replaces the ones that were outputted by dfn1
.
Lets say a1
holds Objects o1, b1, c1, d1, e1, f1, g1
dfn1
manipulates and outputs b1 -> b2, c1 -> c2, g1 -> g2
to PCollection<A> a2
the new PCollection combined from a1
and a2
should contain o1, b2, c2, e1, f1, g2
Is there a built-in mechanism to accomplish something like that? The collections may be keyed before the "merge".
Thanks in advance.
As i am unsatisfied by my english explanation of the problem, here is a DoFn which performs what I was asking for. The real question is, if there is a built-in transform that can do something like this, best would be without manually creating a view before.
public class CombineKvCollectionsWithMasterCollection extends DoFn<KV<String, Object>, Object>{
private static final long serialVersionUID = 4100849850259729106L;
private PCollectionView<Map<String, Object>> masterView;
public CombineKvCollectionsWithMasterCollection(PCollectionView<Map<String, Object>> masterView) {
this.masterView = masterView;
}
@ProcessElement
public void processElement(ProcessContext c) {
KV<String, Object> kv = c.element();
Map<String, Object> masterMap = c.sideInput(masterView);
if (masterMap.containsKey(kv.getKey())) {
c.output(masterMap.get(kv.getKey()));
} else {
c.output(kv.getValue());
}
}
}