I am trying to set up a flink-yarn session to run ~100+ batch jobs. After getting connected to ~40 task managers and ~10 jobs running (each task manager with 2 slots and 1GB memory each) it looks like the session becomes unstable. There were enough resources available. The flink UI suddenly becomes not available, I guess the job manager might have died already. Eventually, the yarn application also got killed.
Job manager is running on 4 core 16GB node 12 gb available
Is there any guide to do the math for job manager resource vs the number of task managers it can handle?