8

I have a dataflow pipeline in which I consume pubsub messages, treat them, and then publish to pubsub.

Whenever I have too many calculations (ie I increase the amount of treatment for each message) I get an Exception. : java.util.concurrent.ExecutionException: org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: CANCELLED: cancelled before receiving half close

What causes this error? How can I avoid it? Full stacktrace:

org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled
        org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517)
        org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:335)
        org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
        org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
        org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
        org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.start(RegisterAndProcessBundleOperation.java:254)
        org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
        org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
        org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        java.lang.Thread.run(Thread.java:745)
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Jurgen Palsma
  • 300
  • 1
  • 3
  • 16
  • Can I ask which sdk (python or Java) and which beam version are you using (guessing it's streaming mode) ? We have encountered similar issues with python streaming pipeline today. And it seems like the pipeline latency never goes down.... – gingi007 Jul 23 '19 at 17:04
  • 1
    dd anyone manage to fix it? experiencing the same issue - streaming pipeline in python SDK. – SockworkOrange Mar 18 '20 at 15:43
  • Jurgen, did you resolve an issue? – Vlad Oct 05 '20 at 20:02
  • We noticed this error due to memory issues. Increasing worker memory helps sometimes. – pratpor May 13 '21 at 06:42

0 Answers0