I'm running tasks periodically and to provide flexibility for the intervals, the next timeout is calculated at the end of each task, converted to milliseconds from Instant.now(), and scheduled using ScheduledExecutorService#schedule
.
This code is generally working fine (blue curve on the left), but other days not so good.
It appears to me that things sometimes go bad at startup (machines are restarted every night) and though that the program should, and does, correct itself ScheduledExecutorService#schedule
does not recover and scheduled tasks run late all the time. It appears that a full JVM restart is the only solution.
My initial thoughts was that this was a bug and depending of the time of the machine startup, things could go wrong. But the following log output demonstrates that the problem is related to my usage of ScheduledExecutorService#schedule
:
// Log time in GMT+2, other times are in GMT
// The following lines are written following system startup (all times are correct)
08 juin 00:08:49.993 [main] WARN com.pgscada.webdyn.Webdyn - Scheduling next webdyn service time. Currently 2018-06-07T22:08:49.993Z, last connection null
08 juin 00:08:50.586 [main] INFO com.pgscada.webdyn.Webdyn - The next data sample at 2018-06-07T22:10:00Z and the next FTP connection at 2018-06-07T22:30:00Z
08 juin 00:08:50.586 [main] WARN com.pgscada.webdyn.Webdyn - Completed webdyn schedule in 9ms, next execution at 2018-06-07T22:10:00Z (in 69414 ms) will run as data-sample
// So we are expecting the next execution to occur at 00:10:00 (or in 69.4 seconds)
// Except that it runs at 00:11:21
08 juin 00:11:21.206 [pool-1-thread-4] INFO com.pgscada.webdyn.Webdyn - Executing Webdyn service, isDataSample=true, isFtpConnection=false, nextTimeout=2018-06-07T22:10:00Z, lastFtpConnection=null
// But thats OK because it should correct itself
08 juin 00:13:04.151 [pool-1-thread-4] WARN com.pgscada.webdyn.Webdyn - Scheduling next webdyn service time. Currently 2018-06-07T22:10:00Z, last connection null
08 juin 00:13:04.167 [pool-1-thread-4] INFO com.pgscada.webdyn.Webdyn - The next data sample at 2018-06-07T22:20:00Z and the next FTP connection at 2018-06-07T22:30:00Z
08 juin 00:13:04.167 [pool-1-thread-4] WARN com.pgscada.webdyn.Webdyn - Completed webdyn schedule in 0ms, next execution at 2018-06-07T22:20:00Z (in 415833 ms) will run as data-sample
// So now we are expecting the next execution to occur at 00:20:00 (or in 415.8 seconds)
// But it runs at 00:28:06
08 juin 00:28:06.145 [pool-1-thread-4] INFO com.pgscada.webdyn.Webdyn - Executing Webdyn service, isDataSample=true, isFtpConnection=false, nextTimeout=2018-06-07T22:20:00Z, lastFtpConnection=null
Below is actual production code of the scheduling function.
ScheduledExecutorService EXECUTORS = Executors.newScheduledThreadPool(10);
private void scheduleNextTimeout(Instant currentTime, Instant lastFtpConnection) {
try {
log.info("Scheduling next webdyn service time. Currently {}, last connection {}", currentTime, lastFtpConnection);
// Parse config files first
getConfigIni().parse();
long time = System.nanoTime();
final Instant earliestPossibleTimeout = Instant.now().plusSeconds(5);
Instant nextDataSample = nextTimeout(currentTime);
if (nextDataSample.isBefore(earliestPossibleTimeout)) {
final Instant oldTime = nextDataSample;
nextDataSample = nextTimeout(earliestPossibleTimeout);
log.warn("Next data sample was calculated to a time in the past '{}', resetting to a future time: {}", oldTime, nextDataSample);
}
Instant nextFtp = nextFtpConnection(currentTime, lastFtpConnection);
if (nextFtp.isBefore(earliestPossibleTimeout)) {
final Instant oldTime = nextFtp;
nextFtp = nextFtpConnection(earliestPossibleTimeout, lastFtpConnection);
log.warn("Next FTP connection was calculated to a time in the past '{}', resetting to a future time: {}", oldTime, nextFtp);
}
final boolean isFtpConnection = !nextDataSample.isBefore(nextFtp);
final boolean isDataSample = !isFtpConnection || nextDataSample.equals(nextFtp);
log.info("The next data sample at {} and the next FTP connection at {}", nextDataSample, nextFtp);
final Instant nextTimeout = nextDataSample.isBefore(nextFtp) ? nextDataSample : nextFtp;
final long millis = Duration.between(Instant.now(), nextTimeout).toMillis();
EXECUTORS.schedule(() -> {
log.info("Executing Webdyn service, isDataSample={}, isFtpConnection={}, nextTimeout={}, lastFtpConnection={}",
isDataSample, isFtpConnection, nextTimeout, lastFtpConnection);
long tme = System.nanoTime();
try {
connect(isDataSample, isFtpConnection, nextTimeout, lastFtpConnection);
log.warn("Completed webdyn service in {}s", (System.nanoTime() - tme) / 1000000);
} catch (final Throwable ex) {
log.error("Failed webdyn service after {}ms : {}", (System.nanoTime() - tme) / 1000000, ex.getMessage(), ex);
} finally {
scheduleNextTimeout(nextTimeout, isFtpConnection ? nextTimeout : lastFtpConnection);
}
}, millis, TimeUnit.MILLISECONDS);
log.warn("Completed webdyn schedule in {}ms, next execution at {} (in {} ms) will run as {}",
(System.nanoTime() - time) / 1000000, nextTimeout, millis, isFtpConnection ? "ftp-connection" : "data-sample");
} catch (final Throwable ex) {
log.error("Fatal error in webdyn schedule : {}", ex.getMessage(), ex);
}
}