I am using Flink 1.15 DataStream api to do ETL job. I want to set my job set BATCH execution mode, so I use code provided in official webstie. env.setRuntimeMode(RuntimeExecutionMode.BATCH);
However, I encountered the following error:
java.lang.UnsupportedOperationException at org.apache.flink.runtime.io.network.partition.ResultPartition.getAllDataProcessedFuture(ResultPartition.java:233)
My whole code logic
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.readTextFile("file:///path/to/file");
`env.setRuntimeMode(RuntimeExecutionMode.BATCH);`
DataStream<OutputType> result = text
.map(/* map logic here */ )
.keyBy(/* keyby logic here */)
.reduce(/* reduce logic here */)
result.writeAsText("filePath")
Can anyone provide some insights on why I am getting this error and how to resolve it? Thanks!
Background of my project (if you want to know more why I want to use batch mode):
I am currently working on a job that reads data from S3, performs some transformations and reductions on the data with keys. In the process, I am encountering a problem where my application seems to store every intermediate reduction result rather than just the final reduced value for each key. I understand that this is likely due to the nature of streaming execution, which continuously processes events as they arrive.My situation is quite similar to this post : https://stackoverflow.com/questions/58828218/how-to-avoid-duplicate-key-tuples-in-word-count-w-apache-flink
So I want to change to batch mode to see if it works.
what I try:
- I delete the transformation logic, but still have the same error as above:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.readTextFile("file:///path/to/file");
`env.setRuntimeMode(RuntimeExecutionMode.BATCH);`
text.writeAsText("filePath")