64

We have a Spring 3 web application on Tomcat 6 that uses several scheduled services via @Scheduled (mainly for jobs that run every night). Now it appears that sometimes (rarely, perhaps once in two months or so) the scheduler thread stops working, so none of the jobs will be executed in the following night. There is no exception or logging entry in our log files.

Has anybody a clue why this is happening? Or how to get more information about this problem?

Is there a way to detect this situation within the application and to restart the scheduler?

Currently we are solving this by having also a logging job that runs every 5 minutes and creates a log entry. If the log file stops being updated (monitored by nagios), we know it is time to restart tomcat. It would be nice to restart the jobs without a complete server restart.

obecker
  • 2,132
  • 1
  • 19
  • 23
  • 7
    What is the work being done in the scheduled tasks? Is it possible that something becomes stuck in an infinite loop? I ask because the scheduled tasks, by default, use a threadpool of 1 thread, and if it gets hung somehow, your future tasks will not be started (but I am sure they would be queued). – nicholas.hauschild Jul 28 '13 at 17:23
  • @nicholas.hauschild It calls an external REST webservice. So you are saying that such a request might possibly block (deadlock?) and therefore stop all other jobs. I think I will request a thread dump of the server if this happens again. Thanks for your input. – obecker Jul 29 '13 at 18:31
  • Taking a thread dump will probably be a good idea. – nicholas.hauschild Jul 29 '13 at 19:11
  • Things to consider 1 - Enforce a timeout on the call to the REST service. Maybe even spawn that call in a separate thread and kill it if there is no response within a specified time. – Steve Nov 06 '13 at 09:45
  • Things to consider 2 - Control scheduling from outside your web application. It tends to be more reliable/controllable that way. Maybe take a look at Spring Batch as a means of controlling and monitoring jobs. – Steve Nov 06 '13 at 09:46
  • if it's in tomcat, then have you checked out the `localhost.log`? usually some uncaught exceptions end up there. also you may want to enable `continueScheduledExecutionAfterException` of the scheduler – hsluo Apr 04 '14 at 17:29

3 Answers3

29

Since this question got so many votes, I'll post what the (probably very specific) solution to my problem was.

We are using the Apache HttpClient library to make calls to remote services in the scheduled jobs. Unfortunately there are no default timeouts set when performing requests. After setting

connectTimeout
connectionRequestTimeout
socketTimeout

to 30 seconds the problem was gone.

int timeout = 30 * 1000; // 30 seconds
RequestConfig requestConfig = RequestConfig.custom()
        .setConnectTimeout(timeout)
        .setConnectionRequestTimeout(timeout)
        .setSocketTimeout(timeout).build();
HttpClient client = HttpClients.custom()
        .setDefaultRequestConfig(requestConfig).build();
obecker
  • 2,132
  • 1
  • 19
  • 23
  • 9
    I was facing the EXACT same problem down to using Apache HttpClient.... You, my friend, are a gentleman and a scholar! – Nicholas Terry Aug 13 '16 at 00:39
  • 1
    This was indeed my problem as well, specifically was using Jersey with the ApacheConnector configured with a PoolingHttpClientConnectionManager. It's critical to set the **connectionRequestTimeout** parameter as the pool could hang indefinitely if this is not set. To do this, you have to set it in a RequestConfig and set the entire request config in the connector client config like so: `RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000).setSocketTimeout(2000).setConnectionRequestTimeout(200).build(); clientConfig.property(ApacheClientProperties.REQUEST_CONFIG, rc);` – David Dec 08 '16 at 20:58
  • @David is right, I've met same situation and I had socketTimeout and connectTimeout set, so connectionRequestTimeout is must to be set – mulya Nov 17 '22 at 10:44
  • I don't understand why this prevents the @scheduled job from running. Wouldn't this just prevent that individual run from succeeding due to a timeout? – ndtreviv Mar 06 '23 at 10:16
  • @ndtreviv The scheduler won't start a new run if the previous one hasn't been finished yet. – obecker Jul 10 '23 at 15:03
5

This is pretty easy to find out. You would be doing this with a stack trace. There are many posts on how to get a stack trace, on unix system you do 'kill -3 ' and the stack trace appears in the catalina.out log file.

Once you have a stack trace, find the scheduler thread and see what it is doing. Is it possible that the task it was executing got stuck?

you can also post the stack trace here for more help.

what is important to know is what scheduler you use. if you use the SimpleAsyncTaskExecutor, it will start a new thread for each task, and your scheduling will never fail. However, if you have tasks that don't finish, you will run out of memory eventually.

http://docs.spring.io/spring/docs/3.0.x/reference/scheduling.html

  • Thanks - taking a thread dump has already been proposed by nicolas.hausschild and I have found a blocked HTTP call from the REST service. I have updated the HttpClient library and I wonder if this might solve the problem already. – obecker Mar 17 '14 at 07:43
2

In my case stack trace was absolutely clean, thread started only a couple of time and that's all. The problem was in conflict with another schedule.

Updated

Schedule not work correctly, because I use fixedDelayString and the previous job not ended when was time to start new. After changed schedule to fixedRateString, threads started correctly.

dos4dev
  • 429
  • 2
  • 10
  • 26