0

What happens when there is an Exception thrown from the jar application to the Task Manager while processing an event?

a) Flink Job Manager will kill the existing task manager and create a new task manager?

b) Task manager itself recovers from the failed execution and restart process using local state saved in RocksDB?

java.lang.IllegalArgumentException: "Application error-stack trace"

I have a doubt that if that same kind erroneous events are getting processed by each of the task manager available hence they all get killed and entire flink job is down.

I am noticing that if some application error comes then eventually entire job will get down.

Don't figured out the exact reason as of now.

tom
  • 3,720
  • 5
  • 26
  • 48

1 Answers1

1

In general, the exception in the Job should not cause the whole Task Manager to go down. We are talking about "normal" exceptions here. In such case the Job itself will fail and the Task Manager will try to restart it or not depending on the provided restart strategy.

Obviously, if for some reason Your Task Manager will die, for example due to the timeouts or something else. Then it will not be restarted automatically if You do not use some resource manager or orchestration tool like YARN or Kubernetes. The job in such case should be started after there are slots available.

As for the behaviour that You have described that the Job itself is "going down" I assume here that the job is simply going to FAILED state. This is due to the fact that different restart strategies have different thresholds for max number of retries and If the job will not work after the specified number of restarts it will simply go to failed state.

Dominik Wosiński
  • 3,769
  • 1
  • 8
  • 22
  • Lets say multiple apps.are running in that taskmanager (multiple slots) then if the task manager itself goes.down wont it.impact other app running in different slot of the same taskmanager – Abhishek May 22 '21 at 03:05