How to handle failures when publishing to pubsub using pubsub write in apache beam

Question

I'm developing an Apache beam pipeline to publish unbounded data into a pubsub topic. Publishing is done using pubsub IO connector PubsubIO.writeMessages().

If pubsub connection is failed during pipeline is processing, I need to capture the connection failure and identify the data which is being processed during the connection failure. But I couldn't find a straight forward failure handling mechanism in Apache beam pubsub write.

When I test this using a bad pubsub connection, pipeline is trying to connect throwing following exception for a while and if the connection is unsuccessful pipeline execution will fail.

com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
    at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:69)
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
    at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
    at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1041)
    at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1215)
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
    at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
    at io.grpc.Status.asRuntimeException(Status.java:535)
    ... 10 more
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: /127.0.0.1:58843
Caused by: java.net.ConnectException: Connection refused: no further information
    at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779)
    at io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)

I tried to catch this exception from the pubsub write transform and it is not working either.

So my question is: Is there any way to capture above exception and continue pipeline until the connection is successful? My pubsub write code snippet is as follows:

public class PubSubWrite extends PTransform<PCollection<String>, PDone> {

    private final String outputTopic;

    public PubSubWrite(String outputTopic) {
        this.outputTopic = outputTopic;
    }

    @Override
    public PDone expand(PCollection<String> input) {
        return input
                .apply(
                        "convertMessagesToPubsubMessages",
                        MapElements.into(TypeDescriptor.of(PubsubMessage.class))
                                .via(
                                        (String json) ->
                                                new PubsubMessage(json.getBytes(Charsets.UTF_8), ImmutableMap.of("SOURCE", "TEST"))))
                .apply(
                        "writePubsubMessagesToPubSub", PubsubIO.writeMessages().to(outputTopic));
    }
}

ewertonvsilva · Answer 1 · 2021-11-30T12:08:13.757

There is not a native API for error handling in transforms for PubsubIO as you can see on the documentation.

I recommend you to open a feature request on issue tracker asking for a error handling implementation on the Java Library - PubsubIO connector.

While, you could return ans empty error collection or implement it to catch the exception by yourself.

Example for the empty error:

@Override
public WithFailures.Result < PDone, PubsubMessage > expand(PCollection < PubsubMessage > input) {
    PDone done = input //
        .apply(
            "convertMessagesToPubsubMessages",
            MapElements.into(TypeDescriptor.of(PubsubMessage.class))
            .via(
                (String json) - >
                new PubsubMessage(json.getBytes(Charsets.UTF_8), ImmutableMap.of("SOURCE", "TEST"))))
        .apply(
            "writePubsubMessagesToPubSub", PubsubIO.writeMessages().to(outputTopic));

    return WithFailures.Result.of(done, EmptyErrors.in(input.getPipeline()));
}



private static class EmptyErrors extends PTransform < PBegin, PCollection < PubsubMessage >> {

    /** Creates an empty error collection in the given pipeline. */
    public static PCollection < PubsubMessage > in (Pipeline pipeline) {
        return pipeline.apply(new EmptyErrors());
    }

    @Override
    public PCollection < PubsubMessage > expand(PBegin input) {
        return input.apply(Create.empty(PubsubMessageWithAttributesCoder.of()));
    }
}

score 0 · Answer 2 · answered Nov 30 '21 at 17:36

Usually such failures are retried by the runner. For example, Dataflow runner will retry failures indefinitely for streaming jobs. Note that this is in addition to any local (VM level) retries for errors that produce re-triable HTTP error codes (for example 5xx). So pipeline should continue once you fix the underlying issue. But note that your backlog might significantly increase if the pipeline is unable to process data for some time so you might see a delay.

How to handle failures when publishing to pubsub using pubsub write in apache beam

2 Answers2