Azure deployment slot swap - both slots use production settings

Question

I am using deployment slot settings to distinguish between my main (production) slot and the staging slot.

When I perform a swap, the new production app (that was in staging before the swap) properly reads app settings.

~~However, the new staging app (which was the production app before the swap) DOES NOT re-read app settings and continues to use production settings.~~

(EDIT): Turns out the real problem was that the staging slot was running with production settings for a while (in parallel to the production slot), and this persisted up until the end of the swap process where the old production site was restarted with the staging settings. As well as the fact that WEBSITE_HOSTNAME remained at the staging slot setting even after the swap - and remained so until a restart (which incurs downtime).

I am using IOptionsMonitor<MyOptions> as well as registering a change handler myOptionsMonitor.OnChange(...) but nothing works. The new staging app (old production app) always start and reads production settings.

Only if I later manually restart the staging app does it get the correct settings again.

This causes race conditions when two apps run as production and compete for the same resources.

What am I doing wrong? How can I make this work properly?

Clarification based on comment: The new staging (old production) app is restarted as a result of the swap, but it reads the production settings. I verify this by logging the settings in the constructor of my singleton service.

Both slots should be restarted as part of the swap, so it is weird that you are not getting new settings even with a restart https://learn.microsoft.com/en-us/azure/app-service/deploy-staging-slots#what-happens-during-a-swap — Alex AIT, Mar 09 '23 at 13:06
They ARE restarted, but the new staging reads production settings for some reason. I am putting log messages in singleton constructors and I can see that it does restart, but I can also see production settings instead of staging ones.... — Aviad P., Mar 09 '23 at 14:17
There's a big mess going on here, WEBSITE_HOSTNAME on the production site is set to the staging site hostname after swap, but so is the one on the staging website... — Aviad P., Mar 10 '23 at 13:41
Another thing I realized - after the swap, both slots are identical, but with internal inconsistencies. Both have WEBSITE_HOSTNAME of the staging slot, but both the have .NET configuration of the main slot... — Aviad P., Mar 10 '23 at 16:24
I think I know what's going on - according to docs all production settings are applied to the staging slot prior to swap to make sure it works. So during this time there are two "production" instances running, and they can be distinguished from each other by WEBSITE_HOSTNAME. After the staging warmup completes, then the slots are swapped by changing azure routing rules. After this stage, the two apps are even more indistinguishable because both have the WEBSITE_HOSTNAME of the staging site. Only after this step completes does the new staging slot (old production) gets its new config... maybe — Aviad P., Mar 10 '23 at 16:46
Yes this is what happens, but the new production slot never gets a new WEBSITE_HOSTNAME so there is no way for it to detect when the swap has completed... — Aviad P., Mar 10 '23 at 20:00
Looks like the only way to get around this is to prepare some hook in the application itself such as an HTTP endpoint that once hit, will cause the application to again "trust" its .NET configuration despite WEBSITE_HOSTNAME being wrong — Aviad P., Mar 10 '23 at 20:19

Aviad P. · Accepted Answer · 2023-03-10T22:55:01.137

After a day's worth of investigation, I finally figured it out.

Before starting I should say is that the documentation for What Happens During a Swap is correct and should be read a few times over to properly understand.

But apart from that, here is what I discovered and how I worked around the problem.

So, the documentation says that during a swap, the first step is to apply production settings to the staging slot and warm it up. This means that there is a period of time during which both the production slot and the staging slot run the app with production settings - which is the problem I'm trying to fix - I want one and only one instance of a "production" app to be running at any one time.

In order to detect whether my app is running in the staging slot with production settings I check the WEBSITE_HOSTNAME environment variable. Before the swap this has the value of xxxx.azurewebsites.net on the production slot and xxxx-staging1.azurewebsites.net on the staging slot. If I discover that I am running on the staging slot with production settings, I hold back on accessing shared resources until the swap is done.

After the swap is complete, WEBSITE_HOSTNAME will have the value of xxxx-staging1.azurewebsite.net in both slots - so it is impossible to detect when the swap is done using this variable.

In fact, I have not found a way to automatically detect when the swap is complete any other way, so I created an endpoint that has to be triggered manually. This endpoint manually changes the value of WEBSITE_HOSTNAME and also releases the production functionality that is waiting for the swap to complete.

NOTE: Setting the value of WEBSITE_HOSTNAME is important because it is used for application insights telemetry to derive the cloud_RoleName property.

To achieve that I created a singleton service called AzureSwap that facilitates all this:

public class AzureSwap {
    private readonly IOptions<AppOptions> _appOptions;
    private readonly TaskCompletionSource _swapDoneTcs = new();

    public AzureSwap(IOptions<AppOptions> appOptions) {
        _appOptions = appOptions;
        var azureWebsiteHostname =
            Environment.GetEnvironmentVariable("WEBSITE_HOSTNAME");
        var myEnv = appOptions.Value.DeploymentEnvironment;
        if (azureWebsiteHostname != "xxxx.azurewebsites.net" &&
            myEnv == "Production") {
            // Swap is in progress...
            IsSwapping = true;
        }
        else {
            // Swap has completed
            SetSwapDone();
        }
    }

    public bool IsSwapping { get; private set; }

    // This is called by the HTTP endpoint
    public void SetSwapDone() {
        var myEnv = _appOptions.Value.DeploymentEnvironment;
        var hostname = myEnv switch {
            "Production" => "xxxx.azurewebsites.net",
            "Staging1" => "xxxx-staging1.azurewebsites.net",
            _ => throw new Exception($"Unknown environment {myEnv}"),
        };

        Environment.SetEnvironmentVariable("WEBSITE_HOSTNAME", hostname);

        IsSwapping = false;
        _swapDoneTcs.TrySetResult();
    }

    // This is waited upon by production services
    public async Task WaitSwapDone() {
        await _swapDoneTcs.Task;
    }
}

And to use that, for example in a IHostedService I do this:

public class MyHostedService : IHostedService {
    public MyHostedService(AzureSwap azureSwap) {
        _azureSwap = azureSwap;
    }

    public async Task StartAsync(CancellationToken cancellationToken) {
        Task.Run(async () => {
            await _azureSwap.WaitSwapDone();

            // Start long running service operation

        }, cancellationToken);
    }
}

Azure deployment slot swap - both slots use production settings

1 Answers1