0

Currently, I have a basic Kafka streams application that involves a Topology with only a source and a processor, but no sink. Essentially, the Topology only handles the consumption of messages. As for producing messages, we make calls to the Producer API in the ProcessorSupplier instance passed to the Topology, specifically in the overridden process method. Although I understand that the Producer API is redundant here since I could have simply added a sink to the topology, I am in a position where I have to setup my streams application in this way. As for testing, I tried out the TopologyTestDriver class that is available in the kafka-streams-test-utils package. However, I want to not only test out the topology but also the calls to the Producer API. Using the TopologyTestDriver requires me to mock my Producer instance since that is separate from the Streams API. As a result, because the messages are not "fowarded", I am unable to read messages from the TopologyTestDriver for my unit tests.

Here is a simplified version of my process method:

@Override
public void process(String key, String value) {
    // some data processing stuff that I leave out for simplicity sake
    String topic = "...";
    Properties props = ...;
    //Producer<String, String> producer = new KafkaProducer<>(props);
    ProducerRecord<String, String> record = new ProducerRecord(topic, key, value);
    producer.send(record);
}

And here is a simplification of my sample unit test:

@Test
public void process() {
    Topology topology = new Topology();
    topology.addSource("source", "input-topic");
    topology.addProcessor("processor", ..., "source");
    Properties props = ...;

    TopologyTestDriver testDriver = new TopologyTestDriver(topology, props);

    ConsumerRecordFactory<String, String> factory = new ConsumerRecordFactory<>(new StringSerializer(), new StringSerializer());
    // the following line will work fine as long as the producer is mocked
    testDriver.pipeInput(factory.create("input-topic", "key", "value"));

    // since the producer is mocked, no message can be read from the output topic
    ProducerRecord<String, String> outputRecord = testDriver.readOutput("output-topic", new StringDeserializer(), new StringDeserializer());

    assertNull(outputRecord); // returns true
}

To sum up my question, is there a way to write a unit test that tests both the consumption and production of messages within a Topology that uses the Producer API for writing messages to outgoing topics?

Chris Gong
  • 8,031
  • 4
  • 30
  • 51

1 Answers1

2

You should not use a custom Producer but add a sink to your Topology. Calls to Producer.send() are async and thus you might be subject to data loss. To avoid data loss, you would need to make the call sync, ie, get the Future that is returned by send() and wait for its completion before process() returns. However, this has a big impact on your throughput and is not recommended.

If you add a sink you avoid those issue as Kafka Streams will now understand what data was sent to the output topic, and thus no data loss will happen, while Kafka Streams can use the more performant async call.

Beside the correctness issue, it seem you create a new KafkaProducer for every message you process in you current code, what is quite inefficient. Furthermore, using a sink will simplify your code. And of course, you get proper testing capabilities using the TopologyTestDriver.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Hey @Matthias J. Sax, thanks for responding. I just want to note that we are not creating a custom `Producer` every time we get a new message, it was just to show that we were using one to send messages. I do agree that using sinks will simplify the code tremendously, however, I am caught in a situation where I have to read data from and send data to topics in different clusters. I have read that one should use MirrorMaker in this case to replicate data from one cluster to another. If I don't have a topic to replicate from, is there anything else I can do in this situation? – Chris Gong Feb 10 '20 at 13:55
  • I see -- yes, using MM or some other replication tool is the recommended way. Otherwise, you need to make all `send()` calls sync what is not desirable. -- If you really want to stay with with your custom producer, you will need to "mock" the producer to use somehow -- maybe you can use a "supplier pattern" and switch the supplier in your test code. – Matthias J. Sax Feb 10 '20 at 15:36
  • Just wondering, is `context.forward` synchronous and not prone to data loss? Also, as for the "mocking" of the producer, would you recommend `EmbeddedKafka` for this approach or does it not matter? For example, could I just use a `MockProducer` object instead. – Chris Gong Feb 10 '20 at 19:14
  • If you use a sink, Kafka Streams will use async send calls, however, before offsets are committed, it will call `producer.flush()` to ensure all data is written; this avoids data loss. -- You can use `EmbeddedKafka` but it's more heavy weight and I think not necessary. Just using a mocked producer should be sufficient. – Matthias J. Sax Feb 10 '20 at 22:16
  • would it make sense to use `producer.flush()` in my case then? Correct if I'm wrong, but it seems that flushing the producer is synonymous to making the `send` call asynchronous. However, it's not as good performance-wise still as using `context.foward()`. And just another clarification, does `context.forward()` call `context.commit()`? It seems that the forward method calls flush and then commit from what I am understanding, and there is some additional logic to make sure that the records are still being written in the order they are read? – Chris Gong Feb 13 '20 at 13:38
  • Yes, flushing the producer would be equivalent. -- `forward()` does not call `commit()` and also not `flush()` -- the `StreamThread` would commit based on `commit.interval.ms` config though. – Matthias J. Sax Feb 13 '20 at 18:52