0

I noticed that each time I run a new job it takes around 20% longer compared to the time when I launch it again?

Does flink cache some results and reuses them if a job is run multiple times? If so, how can I control this?

I would like to mesure how long my tasks run, but each time I rerun them it's faster than before.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
nanobot
  • 108
  • 5
  • 18
  • 1
    When you restart, are you restarting without state, or are you resuming from a checkpoint or savepoint? – David Anderson Dec 08 '21 at 11:20
  • @DavidAnderson I'm not using checkpoints. At least not actively, I don't know if flink does something like that on it's own. I restart the task by running it again from the console: ./bin/flink run --class org.... – nanobot Dec 19 '21 at 13:29

1 Answers1

1

If you using some stateful functions and configured checkpoints and savepoints your job can take a time to restore the state from checkpoints.

In order to make state fault tolerant, Flink needs to checkpoint the state. Checkpoints allow Flink to recover state and positions in the streams to give the application the same semantics as a failure-free execution.

More about checkpointing and here.

Monitoring checkpointing

Niko
  • 373
  • 3
  • 13
  • Hi, thanks, I checked, and I'm not using checkpoints. At least not on purpose. I don't know if flink is creating them on its own? Also, I don't have a "Checkpoint Details" on any of my jobs, so I would assume I'm not using any checkpoints. So I'm wondering why my first run is quite significant slower than later runs. I rerun my jobs via the console : ./bin/flink run --class org.myJob.Main /home/dab/myJob.jar – nanobot Dec 19 '21 at 13:33