2

I have around ~20 scheduled methods for automation. They all run fine but they all stop after a few hours.

My logs don't show any errors/server crash, it's just as if spring boot decided to not do any @scheduled methods anymore.

My first intuition was that there was probably infinite loop in the body of a method, however, all my methods have loggers at the start and at the end. i.e. if there we an infinite loop, my final log wouldn't be saying [foo finished successfully].

I even created a tester that just prints every 5 minutes, and that function also stopped, with all the other ones, after a few hours.

My second intuition was to check the file size, since maybe the file size was too big and the logger just stopped logging into the file, and somehow this made the automation stop (scraping the barrel at this point), but since the automation only ran for a few hours, the file size is only ~1200kb, so this was not the issue.

Basically, I don't think there's an infinite loop somewhere because of the way my loggers are set up, I'm not getting any error messages in my logs and I don't know how to debug this.

I tried to include as much useful information, if something is not clear/missing, please let me know.

Other than that, any ideas on how to debug or what could be causing this?

Javier
  • 39
  • 6
  • Without seeing your configuration it is hard to tell. However I would suggest to check your memory and the number of available threads. I have a feeling you cannot construct anymore threads (and are using the "wrong" `TaskScheduler`) and thus a symptom is no more executions. – M. Deinum Jun 30 '20 at 05:37

1 Answers1

0

The problem I was encountering was quite unique, posting answer in case it helps someone in the future.

  1. Loggers stopped writing to file even though they weren't meant to. (Another issue I'll fix eventually) Which meant that even if the last log said [foo finished successfully], it was necessarily the last real log of the application.

  2. Since Scheduling on Spring is single thread by default, there was a poorly optimized method call that would take ~12h to complete and this would make it seem as the automation stopped since the logs weren't updating and automation elsewhere wasn't happening.

  3. I would never let this ~12h method call finish, had I waited the 12h I would've realized that the automation wasn't stopped, it simply had a bottleneck method. I would always restart the automation before this method could finish and this made it look as if the automation had indeed stopped for an unknown reason.

How I found out : In my case, my application runs in a container and once it runs I just CTRL + Z. I had a feeling that the loggers weren't working properly and so once the logger file stopped updating, so I decided to check the live logging of the application instead by typing fg and realized that even though the log file wasn't updating, the server was still running fine.

Javier
  • 39
  • 6