0

I am trying to understand the mechanism that Flink or to be more specific the job manager follows to deploy tasks on a task manager or a task slot, so i will try to explain my self in three questions :

1- does the job manager deals directly with the task slots and then deploy a task on a task slot ? if yes based on what will the job manager choose a slot ? what if not all slots possess equal resources ?

2- if the job manager will submit a task only to task managers, then how is a task manager chosen ? and how will the task manager then distribute this task to its slots ? and what if the resources for each task manager are not the same ?

3-if the whole pipeline has the same parallelism (ex.256) except the last transformation(assume its a map) and the sink, they have the half of the parallelism (128), in this case a data shuffling will follow and some of the slots or task managers should be released (as they are not needed any more for this job) and some not, how will Flink determine which slots/task managers will be released and which will continue doing the job ?

Used Flink version : 1.17.1 I am aware that Flink will try to deploy producer and consumer on the same task manager, and when a groupBy or rebalane is applied then a data shuffling will follow, and the data will be distributed either based on the hash value of the key(groupBy) or in a round-robin fashion(rebalance/change parallelism).

I have spent some time looking inside the source code of Flink(especially flink runtime) but its not clear where exactly to search because there are many classes that seems to be related to this topic. I would appreciate any help, or even refer me to any other answer or reference. Note: i am only working on my laptop(there is no cluster), and i am using the DataSet API. Thank you all

Mahmoud
  • 13
  • 3

0 Answers0