0

I have two flink jobs in one Application: 1)First is flink batch job that sends events to kafka, which is then written by someone else to s3 2)Second is flink batch job that checks generated data(reads s3).

Considerations. These 2 jobs work fine separately. When combined only first job is completed and sends events to kafka. But the second is failing when I'm traversing the result of SQL

...
//First job
  val env = org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.getExecutionEnvironment
...
  //Creates Datastream from generated events and gets the store

  streamingDataset.write(store)
  env.execute()
...

// Second job
  val flinkEnv: = org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionEnvironment

val batchStream: DataStream[RowData] =
    FlinkSource.forRowData()
      .env(flinkEnv)
      .tableLoader(tableLoader)
      .streaming(false)
      .build()

  val tableEnv = StreamTableEnvironment.create(flinkEnv)

  val inputTable = tableEnv.fromDataStream(batchStream)
  tableEnv.createTemporaryView("InputTable", inputTable)
  val resultTable: TableResult = tableEnv
    .sqlQuery("SELECT * FROM InputTable")
    .fetch(3)
    .execute()
  val results: CloseableIterator[Row] = resultTable.collect()
  while (results.hasNext) {
    print("Result test " + event)
  }
...

org.apache.flink.streaming.api.operators.collect.CollectResultFetcher [] - An exception occurred when fetching query results java.lang.NullPointerException: Unknown operator ID. This is a bug. at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76) at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sendRequest(CollectResultFetcher.java:166) at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:129) at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106) at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80) at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222) ~[?:?]

I want to have two jobs in one application to have generated data in-memory(so I don't have to take care of saving it somewhere else). Is it possible to combine these two jobs or do I have to run them separately? Or is there a better way to restructure my code to make it work?

Ana Bo
  • 1
  • 1
  • I don't understand what you mean by "to have generated data in-memory". How is any data kept "in memory" from the first job? And just FYI, there's no good way to use memory to share results between two Flink jobs. – kkrugler Jan 27 '23 at 20:08

0 Answers0