Any equivalent feature of Spark RDD.persist(..) in Flink?

Question

Spark RDD.persist(..) can help avoid the duplicated RDD evaluation.

Is there the same feature in Flink?

Actually I would like to know if I code like the following, Flink will evaluate dataStream once or twice?

val dataStream = env.addSource(...).filter(...).flatMap(...)

val s1 = dataStream.keyBy(key1).timeWindow(...).aggregate(..)

val s2 = dataStream.keyBy(key2).timeWindow(...).reduce(...)

score 1 · Accepted Answer · answered Nov 19 '20 at 07:53

1

There is no need for persist in Flink as a DataStream on which multiple operators are applied is evaluated once and replicates all outgoing messages to each downstream operator.

The program in your case is executed as

                                 /-hash-> keyBy(key1) -> ...
 Source -> Filter -> FlatMap ->-<
                                 \-hash-> keyBy(key2) -> ...

answered Nov 19 '20 at 07:53

Mikalai Lushchytski

1,563
1
9
18

Thanks for the reply! – Grant Nov 19 '20 at 18:45

Any equivalent feature of Spark RDD.persist(..) in Flink?

1 Answers1