What you can do is to write your job so that the sources and sinks are pluggable, and then implement suitable sources and sinks for testing. In other words, something like this:
public class TestableStreamingJob {
private SourceFunction<Long> source;
private SinkFunction<Long> sink;
public TestableStreamingJob(SourceFunction<Long> source, SinkFunction<Long> sink) {
this.source = source;
this.sink = sink;
}
public void execute() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> LongStream = env.addSource(source).returns(TypeInformation.of(Long.class));
LongStream
.map(new IncrementMapFunction())
.addSink(sink);
env.execute();
}
public static void main(String[] args) throws Exception {
TestableStreamingJob job = new TestableStreamingJob(new RandomLongSource(), new PrintSinkFunction<>());
job.execute();
}
}
which can then be tested like this:
public class TestableStreamingJobTest {
@ClassRule
public static MiniClusterWithClientResource flinkCluster =
new MiniClusterWithClientResource(
new MiniClusterResourceConfiguration.Builder()
.setNumberSlotsPerTaskManager(2)
.setNumberTaskManagers(1)
.build());
@Test
public void testCompletePipeline() throws Exception {
ParallelSourceFunction<Long> source = new ParallelCollectionSource(Arrays.asList(1L, 10L, -10L));
SinkCollectingLongs sink = new SinkCollectingLongs();
TestableStreamingJob job = new TestableStreamingJob(source, sink);
job.execute();
assertThat(sink.result).containsExactlyInAnyOrder(2L, 11L, -9L);
}
}
where the sink used for testing is something like this:
public class SinkCollectingLongs implements SinkFunction<Long> {
public static final List<Long> result =
Collections.synchronizedList(new ArrayList<>());
public void invoke(Long value, Context context) throws Exception {
result.add(value);
}
}
This example is lifted from https://github.com/knaufk/flink-testing-pyramid, which you can consult for more details.