I am running into issues where I have a streaming scio pipeline running on Dataflow that is deduplicating messages and performing some counting by key. When I try to drain the pipeline I get a large amount of None.get
exceptions supposedly thrown in my deduplicate step (I am basing this assumption off the label I am observing in the stackdriver log).
We are currently running on scio version 0.7.0-beta1 and beam version 2.8.0. I have tried protecting as much as I can in my code from any potential Nones but this appears like it is occurring further down inside of the deduplicate step.
The error I am getting is the following:
"java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at com.spotify.scio.util.Functions$$anon$2.mergeAccumulators(Functions.scala:227)
at com.spotify.scio.util.Functions$$anon$2.mergeAccumulators(Functions.scala:220)
at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.getAccum(WindmillStateInternals.java:958)
at org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillCombiningState.read(WindmillStateInternals.java:920)
at org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
at org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
at org.apache.beam.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:768)
at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:95)
at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
at org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:135)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:45)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:50)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:202)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:160)
at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1226)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:141)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:965)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
As you can see, this never really enters my code and I am unsure how I should go about finding this issue. Perhaps it has something to do with the "LateDataDroppingDoFnRunner"? Our allowed lateness is relatively large (3 days with windows being an hour long).
val input = PubsubIO.readStrings()
.fromSubscription(subscription)
.withTimestampAttribute("ts")
.withName("Window messages")
.withFixedWindows(
duration = windowSize,
options = WindowOptions(
trigger = AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(earlyFiring))
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(lateFiring)),
accumulationMode = ACCUMULATING_FIRED_PANES,
allowedLateness = allowedLateness
)
)
.withName(s"Deduplicate messages")
.distinctBy[String](f = getId)
...
// I am being overly cautious here because I have been having
// so much trouble debugging this
def getId(message: Map[String, Any]): String = {
message match {
case null => {
logger.warn("message is null when getting id")
""
}
case message => {
message.get("id") match {
case None => {
logger.warn("id is null in message")
""
}
case id => id.get.toString
}
}
}
}
I am confused how I could possibly be getting a None.get
here and why that would only occur when I am draining.
Can I have some advice on how I should go about debugging this error or where I should be looking?