0

I am not able to connect to pubsub from the flink job running on the Dataproc cluster. Please find the code which I am using to connect to the Pubsub

    {
        
        StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();        
        streamExecEnv.setStateBackend(new RocksDBStateBackend("file:///tmp/checkpoints"));
        
        try
        {
            
            DeserializationSchema<String> deserializer = new SimpleStringSchema();
            SourceFunction<String> pubsubSource = PubSubSource.newBuilder()
                                                              .withDeserializationSchema(deserializer)
.withProjectName("vz-it-np-gudv-dev-vzntdo-0")                                                           .withSubscriptionName("subscription1")                                                      .build();
            streamExecEnv.addSource(pubsubSource).print();
            
        }    
        catch(Exception e)
        {
             System.out.println("Flink Exception ----- :"+e);
        }
        streamExecEnv.enableCheckpointing();
        streamExecEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(10,0L));        
        streamExecEnv.execute();
    }

I am able to build the code and created the jar. While executing the jar file in the dataproc cluster with flink version 1.9.3 I am getting the below error in the yarmanager flink application logs.

    at org.apache.flink.streaming.connectors.gcp.pubsub.DefaultPubSubSubscriberFactory.getSubscriber(DefaultPubSubSubscriberFactory.java:62)
    at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.createAndSetPubSubSubscriber(PubSubSource.java:178)
    at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.open(PubSubSource.java:100)
    at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:552)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:416)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: com.google.pubsub.v1.SubscriberGrpc
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 10 more
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionStrategy  - Calculating tasks to restart to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_93.
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionStrategy  - 1 tasks should be restarted to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_93. 
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Sink: Print to Std. Out (72/128) (848d9c247dbd0b06aa542b1d1324fc57) switched from DEPLOYING to RUNNING.
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.failover.AdaptedRestartPipelinedRegionStrategyNG  - Finally restart 1 tasks to recover from task failure.
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Sink: Print to Std. Out (94/128) (e593f0150ae97bedece9ddf55678ad39) switched from CREATED to SCHEDULED.
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Sink: Print to Std. Out (94/128) (e593f0150ae97bedece9ddf55678ad39) switched from SCHEDULED to DEPLOYING.
2023-02-20 10:46:17,544 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: Custom Source -> Sink: Print to Std. Out (94/128) (attempt #19) to container_e01_1676871913676_0005_01_000037 @ vz-it-np-gudv-dev-vzntdo-dp-lr-flink-w-4.us-east4-c.c.vz-it-np-gudv-dev-vzntdo-0.internal (dataPort=43499)
2023-02-20 10:46:17,547 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Sink: Print to Std. Out (77/128) (ce4b9fabd6d4c95073addbf691583f03) switched from RUNNING to FAILED.
java.lang.NoClassDefFoundError: com/google/pubsub/v1/SubscriberGrpc
    at org.apache.flink.streaming.connectors.gcp.pubsub.DefaultPubSubSubscriberFactory.getSubscriber(DefaultPubSubSubscriberFactory.java:62)
    at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.createAndSetPubSubSubscriber(PubSubSource.java:178)
    at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.open(PubSubSource.java:100)
    at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:552)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:416)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
    at java.lang.Thread.run(Thread.java:750)

Please find the dependency which I used for the pubsub.

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-pubsub</artifactId>
    <version>1.122.2</version>
</dependency>

0 Answers0