4

I've used apache flink in batch processing for a while but now we want to convert this batch job to a streaming job. The problem I run into is how to run end-to-end tests.

How it worked in a batch job

When using batch processing we created end-to-end tests using cucumber.

  • We would fill up the hbase table we read from
  • Run the batch job
  • Wait for it to finish
  • verify the result

The problem in a streaming job

We would like to do something similar with the streaming job except the streaming job does not really finish.

So:

  • fill up the message queue we read from
  • Run the streaming job.
  • Wait for it to finish (how?)
  • Verify the result

We could just wait 5 seconds after every test and assume everything has been processed but that would slow everything down a lot.

Question:

What are some ways or best practices to run end-to-end tests on a streaming flink job without forceable terminating the flink job after x seconds

Richard Deurwaarder
  • 2,023
  • 1
  • 26
  • 40
  • This was also answered here: https://stackoverflow.com/questions/44441153/how-to-stop-a-flink-streaming-job-from-program/75961706#75961706 – r_g_s_ Apr 07 '23 at 20:30

1 Answers1

5

Most Flink DataStream sources, if they are reading from a finite input, will inject a watermark with value LONG.MAX_VALUE when they reach the end, after which the job will be terminated.

The Flink training exercises illustrate one approach to doing end-to-end testing of Flink jobs. I suggest cloning the github repo and looking at how the tests are setup. They use a custom source and sink and redirect the input and output for testing.

This topic is also discussed a bit in the documentation.

David Anderson
  • 39,434
  • 4
  • 33
  • 60