1

I am trying to determine how to devide the task slots for my flink job. To be more specific, is there a reason to use 2 task slots (or more) per task manager instead of one task slot per task manager? I read that multiple task slots per task manager help to reduce network overhead but is there another benefit?

In addition, I wonder if there is a benefit for 'stand-by' task slots (namely, set the parallelism to a value smaller than the number of available task slots)?

Thanks ahead:)

JoeHills
  • 43
  • 4

2 Answers2

2

Another factor to consider: a TM with multiple slots will have all of those slots running in the same JVM. So if you are using the heap-based state backend, then all of those objects for all of those slots will be handled by the same garbage collector -- which will lead to longer GC stalls (with the older JVMs that Flink still requires). This will be less of a factor someday when Flink supports more modern garbage collectors, and isn't an issue with RocksDB.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
1
  • Better resource utilisation : Task manager with n slots means task-manager will dedicate 1/n of its memory to each slot. It means more subtasks share the same JVM and share TCP connections (via multiplexing) and heartbeat messages. They may also share data sets and data structures, thus reducing the per-task overhead. Also, a good default number of task slots would be the number of CPU cores.

There is no defined configuration. It totally depends on the application and the use case.

A configuration with 1 slot per task manager provides isolation and is easier to manage. But this might leave some memory unused. On the other hand, if multiple slots and multiple pipelines/jobs are run, tasks of diff jobs might get scheduled on 1 task manager. Because of high memory consumption of specific job, if task manager goes down it will restart all jobs running on that. If you only run a single job per cluster, multiple slots per TM might be fine.

Stand-by task slots helps when any task manager is down or pipelines-restarts. In these cases job-manager assigns those tasks to these stand-by slots and application will experience less downtime.

  • 1
    Thanks! I also wonder - if I run only 1 job on my cluster and the number of task slots equal to the parallelism, each task slot will run an entire pipeline (all the operators)? Or some instances of the same operator will run on a single slot? – JoeHills Aug 10 '22 at 12:22