Running my streaming dataflow job for a longer period of time tends to end up in a "GC overhead limit exceeded" error which brings the job to a halt. How can I best proceed to debug this?
java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.google.cloud.dataflow.worker.repackaged.com.google.common.collect.HashBasedTable.create (HashBasedTable.java:76)
at com.google.cloud.dataflow.worker.WindmillTimerInternals.<init> (WindmillTimerInternals.java:53)
at com.google.cloud.dataflow.worker.StreamingModeExecutionContext$StepContext.start (StreamingModeExecutionContext.java:490)
at com.google.cloud.dataflow.worker.StreamingModeExecutionContext.start (StreamingModeExecutionContext.java:221)
at com.google.cloud.dataflow.worker.StreamingDataflowWorker.process (StreamingDataflowWorker.java:1058)
at com.google.cloud.dataflow.worker.StreamingDataflowWorker.access$1000 (StreamingDataflowWorker.java:133)
at com.google.cloud.dataflow.worker.StreamingDataflowWorker$8.run (StreamingDataflowWorker.java:841)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
at java.lang.Thread.run (Thread.java:745)
- Job ID: 2018-02-06_00_54_50-15974506330123401176
- SDK: Apache Beam SDK for Java 2.2.0
- Scio version: 0.4.7