1

Here we go: I got quite complicated topology of various joins, aggregations, filters, maps, etc. By defaul the NUM_STREAM_THREADS_CONFIG parameter equals to 1 and that's completely determenistic by definition - thus, partition's total ordering (that is guaranteed by Kafka itself) preserved.

Will total ordering be preserved once I set NUM_STREAM_THREADS_CONFIG to 2 or more then that? Does it depend upon special topology? I've checked the docs and went throught the threading model section, yet did not find an answer.

Zazaeil
  • 3,900
  • 2
  • 14
  • 31

1 Answers1

3

Data is always processed in per-partition offset order, even if you set num.stream.threads to a larger value.

In Kafka Streams, sub-topologies are translated into tasks (based on input topic partitions) and tasks process records of their partitions in offset order. The number of tasks limits the number of threads you can keep busy (similar to the maximum number of consumers in a consumer group). If you configure more threads than available tasks, some threads just stay idle.

If a task processed data from multiple topics/partitions, there is no strict ordering guarantee for data of different partitions. Kafka Streams will take the record timestamps into account thought, and process records with smaller timestamp first.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Is there a way to measure number of tasks generated by my topology? – Zazaeil Jun 03 '20 at 15:26
  • 1
    Not exactly sure what you mean by "measure". If you know the number of partitions per input topic, you can do the math. The number of tasks always corresponds to the max number of partitions of all input topics of a sub-topology. -- Also, Kafka Streams will log which tasks are created. Last, Kafka Streams exposed per-task metric: https://docs.confluent.io/current/streams/monitoring.html --- there is also `KafkaStreams#localThreadMetadata()` that tells you what tasks are assigned to your local threads. – Matthias J. Sax Jun 03 '20 at 19:28