-1

I want to implement a process where I load 2 Kinds of data, lets say Kind A and B, PCollection<A> a1, PCollection<B> b1. Then I create a View.asMap() from a1 and give it to a DoFn dfn1 as sideinput that is applied on b1. This DoFn uses some of the values of Kind A and outputs them. Afterwards, I want to create a new PCollection<A> a2 that holds all the objects of a1, but replaces the ones that were outputted by dfn1.

Lets say a1 holds Objects o1, b1, c1, d1, e1, f1, g1 dfn1 manipulates and outputs b1 -> b2, c1 -> c2, g1 -> g2 to PCollection<A> a2

the new PCollection combined from a1 and a2 should contain o1, b2, c2, e1, f1, g2

Is there a built-in mechanism to accomplish something like that? The collections may be keyed before the "merge".

Thanks in advance.

As i am unsatisfied by my english explanation of the problem, here is a DoFn which performs what I was asking for. The real question is, if there is a built-in transform that can do something like this, best would be without manually creating a view before.

public class CombineKvCollectionsWithMasterCollection extends DoFn<KV<String, Object>, Object>{
    private static final long serialVersionUID = 4100849850259729106L;

    private PCollectionView<Map<String, Object>> masterView;

    public CombineKvCollectionsWithMasterCollection(PCollectionView<Map<String, Object>> masterView) {
        this.masterView = masterView;
    }

    @ProcessElement
    public void processElement(ProcessContext c) {
        KV<String, Object> kv = c.element();
        Map<String, Object> masterMap = c.sideInput(masterView);
        if (masterMap.containsKey(kv.getKey())) {
            c.output(masterMap.get(kv.getKey()));
        } else {
            c.output(kv.getValue());
        }
    }
}
Malte
  • 589
  • 5
  • 24

1 Answers1

-1

The Combine function does the basic functions like Sum, Min, Max and Mean. For a specific combine functionality, you would need to provide some processing logic. So, there is no in-built function that would do this for now.