After a day's worth of investigation, I finally figured it out.
Before starting I should say is that the documentation for What Happens During a Swap is correct and should be read a few times over to properly understand.
But apart from that, here is what I discovered and how I worked around the problem.
So, the documentation says that during a swap, the first step is to apply production settings to the staging slot and warm it up. This means that there is a period of time during which both the production slot and the staging slot run the app with production settings - which is the problem I'm trying to fix - I want one and only one instance of a "production" app to be running at any one time.
In order to detect whether my app is running in the staging slot with production settings I check the WEBSITE_HOSTNAME
environment variable. Before the swap this has the value of xxxx.azurewebsites.net
on the production slot and xxxx-staging1.azurewebsites.net
on the staging slot. If I discover that I am running on the staging slot with production settings, I hold back on accessing shared resources until the swap is done.
After the swap is complete, WEBSITE_HOSTNAME
will have the value of xxxx-staging1.azurewebsite.net
in both slots - so it is impossible to detect when the swap is done using this variable.
In fact, I have not found a way to automatically detect when the swap is complete any other way, so I created an endpoint that has to be triggered manually. This endpoint manually changes the value of WEBSITE_HOSTNAME
and also releases the production functionality that is waiting for the swap to complete.
NOTE: Setting the value of WEBSITE_HOSTNAME
is important because it is used for application insights telemetry to derive the cloud_RoleName
property.
To achieve that I created a singleton service called AzureSwap
that facilitates all this:
public class AzureSwap {
private readonly IOptions<AppOptions> _appOptions;
private readonly TaskCompletionSource _swapDoneTcs = new();
public AzureSwap(IOptions<AppOptions> appOptions) {
_appOptions = appOptions;
var azureWebsiteHostname =
Environment.GetEnvironmentVariable("WEBSITE_HOSTNAME");
var myEnv = appOptions.Value.DeploymentEnvironment;
if (azureWebsiteHostname != "xxxx.azurewebsites.net" &&
myEnv == "Production") {
// Swap is in progress...
IsSwapping = true;
}
else {
// Swap has completed
SetSwapDone();
}
}
public bool IsSwapping { get; private set; }
// This is called by the HTTP endpoint
public void SetSwapDone() {
var myEnv = _appOptions.Value.DeploymentEnvironment;
var hostname = myEnv switch {
"Production" => "xxxx.azurewebsites.net",
"Staging1" => "xxxx-staging1.azurewebsites.net",
_ => throw new Exception($"Unknown environment {myEnv}"),
};
Environment.SetEnvironmentVariable("WEBSITE_HOSTNAME", hostname);
IsSwapping = false;
_swapDoneTcs.TrySetResult();
}
// This is waited upon by production services
public async Task WaitSwapDone() {
await _swapDoneTcs.Task;
}
}
And to use that, for example in a IHostedService
I do this:
public class MyHostedService : IHostedService {
public MyHostedService(AzureSwap azureSwap) {
_azureSwap = azureSwap;
}
public async Task StartAsync(CancellationToken cancellationToken) {
Task.Run(async () => {
await _azureSwap.WaitSwapDone();
// Start long running service operation
}, cancellationToken);
}
}