1

We are wondering if there is a built-in way to warm up services as part of the service upgrades in Service Fabric, similar to the various ways you could warm up e.g. IIS based app pools before they are hit by requests. Ideally we want the individual services to perform some warm-up tasks as part of their initialization (could be cache loading, recovery etc.) before being considered as started and available for other services to contact. This warmup should be part of the upgrade domain processing so the upgrade process should wait for the warmup to be completed and the service reported as OK/Ready.

How are others handling such scenarios, controlling the process for signalling to the service fabric that the specific service is fully started and ready to be contacted by other services?

soren.enemaerke
  • 4,770
  • 5
  • 53
  • 80

2 Answers2

2

In the health policy there's this concept:

HealthCheckWaitDurationSec The time to wait (in seconds) after the upgrade has finished on the upgrade domain before Service Fabric evaluates the health of the application. This duration can also be considered as the time an application should be running before it can be considered healthy. If the health check passes, the upgrade process proceeds to the next upgrade domain. If the health check fails, Service Fabric waits for an interval (the UpgradeHealthCheckInterval) before retrying the health check again until the HealthCheckRetryTimeout is reached. The default and recommended value is 0 seconds.

Source

This is a fixed wait period though.

You can also emit Health events yourself. For instance, you can report health 'Unknown' while warming up. And adjust your health policy (HealthCheckWaitDurationSec) to check this.

LoekD
  • 11,402
  • 17
  • 27
  • Thanks for the reply @LoekD. We've tried this but the upgrade process continues on from the initial upgrade domain regardless of this wait duration. From all we've read so far it seems like we need to emit an Unknown Health event very early and then set it to OK once the warmup has completed. – soren.enemaerke Jun 21 '16 at 07:09
1

Reporting health can help. You can't report Unknown, you must report Error very early on, then clear the Error when your service is ready. Warning and Ok do not impact upgrade. To clear the Error, your service can report health state Ok, RemoveWhenExpired=true, low TTL (read more on how to report).

You must increase HealthCheckRetryTimeout based on the max warm up time. Otherwise, if a health check is performed and cluster is evaluated to Error, the upgrade will fail (and rollback or pause, per your policy).

So, the order the events is:

  • your service reports Error - "Warming up in progress"
  • upgrade waits for fixed HealthCheckWaitDurationSec (you can set this to min time to warm up)
  • upgrade performs health checks: if the service hasn't yet warmed up, the health state is Error, so upgrade retries until either HealthCheckRetryTimeout is reached or your service is not in Error anymore (warm up completed and your service cleared the Error).
Oana Platon
  • 316
  • 1
  • 2
  • Sorry for dropping the ball on this, @oana-platon. I've just tested this and have a working solution that will delay the upgrade domain progress until the service goes into OK, controlled by custom health events (see https://github.com/enemaerke/servicefabric-upgradetests). – soren.enemaerke Sep 01 '16 at 12:04