0

Apache Samza's documentation states that it can be run with multiple threads per worker:

Threading model and ordering

Samza offers a flexible threading model to run each task. When running your applications, you can control the number of workers needed to process your data. You can also configure the number of threads each worker uses to run its assigned tasks. Each thread can run one or more tasks. Tasks don’t share any state - hence, you don’t have to worry about coordination across these threads.

From my understanding, this means Samza uses the same architecture as Kafka Streams, i.e. tasks are statically assigned to threads. I think a reasonable choice would be to set the number of threads more or less equal to the number of CPU cores. Does that make sense?

I am now wondering how the number of threads can be configured in Samza. I found the option job.container.thread.pool.size. However, it reads like this option does something different, which is running operations of tasks in parallel (which could impair ordering (?)). It also confuses me that the default value is 0 instead of 1.

Sören Henning
  • 326
  • 4
  • 16

0 Answers0