0

I am using Flink 1.15 DataStream api to do ETL job. I want to set my job set BATCH execution mode, so I use code provided in official webstie. env.setRuntimeMode(RuntimeExecutionMode.BATCH); However, I encountered the following error:

java.lang.UnsupportedOperationException at org.apache.flink.runtime.io.network.partition.ResultPartition.getAllDataProcessedFuture(ResultPartition.java:233)

My whole code logic

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> text = env.readTextFile("file:///path/to/file");

`env.setRuntimeMode(RuntimeExecutionMode.BATCH);`

DataStream<OutputType> result = text
.map(/* map logic here */ )
.keyBy(/* keyby logic here */)
.reduce(/* reduce logic here */)

result.writeAsText("filePath")

Can anyone provide some insights on why I am getting this error and how to resolve it? Thanks!


Background of my project (if you want to know more why I want to use batch mode):

I am currently working on a job that reads data from S3, performs some transformations and reductions on the data with keys. In the process, I am encountering a problem where my application seems to store every intermediate reduction result rather than just the final reduced value for each key. I understand that this is likely due to the nature of streaming execution, which continuously processes events as they arrive.My situation is quite similar to this post : https://stackoverflow.com/questions/58828218/how-to-avoid-duplicate-key-tuples-in-word-count-w-apache-flink

So I want to change to batch mode to see if it works.


what I try:

  1. I delete the transformation logic, but still have the same error as above:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<String> text = env.readTextFile("file:///path/to/file");

`env.setRuntimeMode(RuntimeExecutionMode.BATCH);`

text.writeAsText("filePath")
sophia wu
  • 1
  • 1

0 Answers0