I'm trying to deduplicate input messages from Google Cloud Pubsub using deduplication function of Apache beam. However, I run into an error after creating KV<String, MyModel>
pair and passing it to Deduplicate
transform.
Error:
ParDo requires a deterministic key coder in order to use state and timers
Code:
PCollection<KV<String, MyModel>> deduplicatedEvents =
messages
.apply(
"CreateKVPairs",
ParDo.of(
new DoFn<MyModel, KV<String, MyModel>>() {
@ProcessElement
public void processElement(ProcessContext c) {
c.output(KV.of(c.element().getUniqueKey(),c.element()));
}
}))
.apply(
"Deduplicate",
Deduplicate.<KV<String, MyModel>>values());
How should I create deterministic coder which can encode/decode string as key, to make this work?
Any input would be really helpful.