Apache Flink: How to update Source Function in Unit Test?

Question

I need my Flink job to read from a local instance of a Source Function and update every time the Source Function instance's data changes within the unit testing code itself rather than a stream.

Pseudocode:

StreamExecutionEnvironment env = ...getExecutionEnvironment();
StockSource src = new StockSource(); // the Source Function instance
env.addSource(src);
results = Pipeline(env); // does some calculations and returns the calculated data
env.execute();


// Test 1
When: src.sendData("TWTR", 120.6);
Assert: results.eurRate == 98.87;

// Test 2
When: src.sendData("GOOG", 300);
Assert: results.eurRate == 245.95;

Is doing something like this even possible in Flink?

score 1 · Answer 1 · answered Dec 18 '20 at 10:33

What you can do is to write your job so that the sources and sinks are pluggable, and then implement suitable sources and sinks for testing. In other words, something like this:

public class TestableStreamingJob {
  private SourceFunction<Long> source;
  private SinkFunction<Long> sink;

  public TestableStreamingJob(SourceFunction<Long> source, SinkFunction<Long> sink) {
    this.source = source;
    this.sink = sink;
  }

  public void execute() throws Exception {
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    DataStream<Long> LongStream = env.addSource(source).returns(TypeInformation.of(Long.class));

    LongStream
      .map(new IncrementMapFunction())
      .addSink(sink);

    env.execute();
  }

  public static void main(String[] args) throws Exception {
    TestableStreamingJob job = new TestableStreamingJob(new RandomLongSource(), new PrintSinkFunction<>());
    job.execute();
  }
}

which can then be tested like this:

public class TestableStreamingJobTest {
  @ClassRule
  public static MiniClusterWithClientResource flinkCluster =
      new MiniClusterWithClientResource(
          new MiniClusterResourceConfiguration.Builder()
              .setNumberSlotsPerTaskManager(2)
              .setNumberTaskManagers(1)
              .build());

  @Test
  public void testCompletePipeline() throws Exception {
    ParallelSourceFunction<Long> source = new ParallelCollectionSource(Arrays.asList(1L, 10L, -10L));
    SinkCollectingLongs sink = new SinkCollectingLongs();
    TestableStreamingJob job = new TestableStreamingJob(source, sink);

    job.execute();

    assertThat(sink.result).containsExactlyInAnyOrder(2L, 11L, -9L);
  }
}

where the sink used for testing is something like this:

public class SinkCollectingLongs implements SinkFunction<Long> {

  public static final List<Long> result =
      Collections.synchronizedList(new ArrayList<>());

  public void invoke(Long value, Context context) throws Exception {
    result.add(value);
  }
}

This example is lifted from https://github.com/knaufk/flink-testing-pyramid, which you can consult for more details.

Is there an example of a custom source that can be used in unit tests — Jayesh Lalwani, Sep 07 '21 at 21:05
I'm not sure what you're looking for, but maybe take a look at how the sources are used in the tests in the flink training exercises: https://github.com/ververica/flink-training. E.g., look at TaxiFareGenerator.runFor and ParallelTestSource, and the tests that use those. — David Anderson, Sep 07 '21 at 22:05
Thanks. I ended up implementing something similar which is little more generic that can be used for different tests — Jayesh Lalwani, Sep 15 '21 at 17:32

score 0 · Answer 2 · answered Sep 15 '21 at 17:57

I implemented my own custom source that wraps a Queue.

import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.TimeUnit;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

// this function has to wrap static members because of the way Flink does parrallelism
// SPotbugs doesn't like this
// so we are going to supress the warnings in code that is wrapping this queue
@SuppressFBWarnings
public final class QueueBasedSourceFunction<T>
    implements SourceFunction<T>, ResultTypeQueryable<T> {

  public static BlockingQueue<Object> queue = new ArrayBlockingQueue<>(1024);
  private static boolean running = false;

  Class<T> clazz;

  public QueueBasedSourceFunction(Class<T> clazz) {
    this.clazz = clazz;
  }

  @Override
  public void run(SourceContext<T> sourceContext) throws Exception {
    this.running = true;
    while (this.running) {

      T elem = (T) queue.poll(1, TimeUnit.SECONDS);
      if (elem != null) {
        sourceContext.collect(elem);
      }
    }
  }

  @Override
  public void cancel() {
    this.running = false;
  }

  @Override
  public TypeInformation<T> getProducedType() {
    return TypeInformation.of(clazz);
  }

  public void produce(T s) {
    queue.offer(s);
  }

  public void waitTillConsumed() throws InterruptedException {
    synchronized (queue) {
      while (!queue.isEmpty()) {
        queue.wait(100);
      }
    }
  }
}

This source will read elements of the queue and output them. In your test, you need to feed the queue. something like this

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
        // configure your test environment
        env.setParallelism(2);
    
        QueueBasedSourceFunction<String> sourceFUnc = new QueueBasedSourceFunction(String.class);
        DataStreamSource<Record> source = env.addSource(sourceFUnc);
        SingleOutputStreamOperator<String> result = source ..... // do whatever you need to do here
        result.addSink(sink());

// start a background thread that feeds test data into the queue
// you can add waits to simulate real data coming in
        Executors.newSingleThreadExecutor()
            .submit(
                () -> {
                  IntStream.range(1, 10)
                      .forEach(
                          i -> {
                            QueueBasedSourceFunction.queue.offer("Foo" + i);
                            QueueBasedSourceFunction.queue.offer("Bar" + i);
                            try {
                              Thread.sleep(2000);
                            } catch (InterruptedException e) {
                              return;
                            }
                          });
                  try {
                    Thread.sleep(10000);
                  } catch (InterruptedException e) {
                    return;
                  }
                  QueueBasedSourceFunction.queue.offer( "CLose");

// we need to wait for the queue to be empty before stopping the source
// if the source is stopped too early, records won't be processed
                  synchronized (QueueBasedSourceFunction.queue) {
                    try {
                      while (!QueueBasedSourceFunction.queue.isEmpty()) {
                        QueueBasedSourceFunction.queue.wait(1000);
                      }
                    } catch (InterruptedException e) {
                      return;
                    }
                  }
//close the source. Your test won't exit until the source is closed
                  sourceFUnc.cancel();
                });
    
        // execute
        env.execute();


}

This test is generating 2 test records every 2 second for 20 seconds, waiting 10 seconds, generating another record, waiting till all records are consumed. You can implement your own logic

Apache Flink: How to update Source Function in Unit Test?

2 Answers2