I am not able to connect to pubsub from the flink job running on the Dataproc cluster. Please find the code which I am using to connect to the Pubsub
{
StreamExecutionEnvironment streamExecEnv = StreamExecutionEnvironment.getExecutionEnvironment();
streamExecEnv.setStateBackend(new RocksDBStateBackend("file:///tmp/checkpoints"));
try
{
DeserializationSchema<String> deserializer = new SimpleStringSchema();
SourceFunction<String> pubsubSource = PubSubSource.newBuilder()
.withDeserializationSchema(deserializer)
.withProjectName("vz-it-np-gudv-dev-vzntdo-0") .withSubscriptionName("subscription1") .build();
streamExecEnv.addSource(pubsubSource).print();
}
catch(Exception e)
{
System.out.println("Flink Exception ----- :"+e);
}
streamExecEnv.enableCheckpointing();
streamExecEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(10,0L));
streamExecEnv.execute();
}
I am able to build the code and created the jar. While executing the jar file in the dataproc cluster with flink version 1.9.3 I am getting the below error in the yarmanager flink application logs.
at org.apache.flink.streaming.connectors.gcp.pubsub.DefaultPubSubSubscriberFactory.getSubscriber(DefaultPubSubSubscriberFactory.java:62)
at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.createAndSetPubSubSubscriber(PubSubSource.java:178)
at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.open(PubSubSource.java:100)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:552)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:416)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: com.google.pubsub.v1.SubscriberGrpc
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:69)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionStrategy - Calculating tasks to restart to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_93.
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionStrategy - 1 tasks should be restarted to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_93.
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Sink: Print to Std. Out (72/128) (848d9c247dbd0b06aa542b1d1324fc57) switched from DEPLOYING to RUNNING.
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.failover.AdaptedRestartPipelinedRegionStrategyNG - Finally restart 1 tasks to recover from task failure.
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Sink: Print to Std. Out (94/128) (e593f0150ae97bedece9ddf55678ad39) switched from CREATED to SCHEDULED.
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Sink: Print to Std. Out (94/128) (e593f0150ae97bedece9ddf55678ad39) switched from SCHEDULED to DEPLOYING.
2023-02-20 10:46:17,544 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Custom Source -> Sink: Print to Std. Out (94/128) (attempt #19) to container_e01_1676871913676_0005_01_000037 @ vz-it-np-gudv-dev-vzntdo-dp-lr-flink-w-4.us-east4-c.c.vz-it-np-gudv-dev-vzntdo-0.internal (dataPort=43499)
2023-02-20 10:46:17,547 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Sink: Print to Std. Out (77/128) (ce4b9fabd6d4c95073addbf691583f03) switched from RUNNING to FAILED.
java.lang.NoClassDefFoundError: com/google/pubsub/v1/SubscriberGrpc
at org.apache.flink.streaming.connectors.gcp.pubsub.DefaultPubSubSubscriberFactory.getSubscriber(DefaultPubSubSubscriberFactory.java:62)
at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.createAndSetPubSubSubscriber(PubSubSource.java:178)
at org.apache.flink.streaming.connectors.gcp.pubsub.PubSubSource.open(PubSubSource.java:100)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:552)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:416)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:750)
Please find the dependency which I used for the pubsub.
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-pubsub</artifactId>
<version>1.122.2</version>
</dependency>