5

I have an MVC 5 web application running on .NET 4.7.2 and hosted in an Azure AppService, that uses Azure Key Vault to hold secrets. The project uses the Microsoft.Azure.KeyVault 3.0.3 NuGet package and the secrets are accessed using the KeyVaultClient and .GetSecretAsync(). All resources are located in the same Azure region.

For the most part this works very well, and for about 90% of the time it returns the secret in milliseconds.

Working Key Vault Access

But every now and then the call to access the Key Vault fails. This doesn't manifest itself as an exception thrown by the SDK, but the web app hangs. Eventually - and normally in around 1 minute but sometimes longer - the secret is returned and all is fine again. This is because the SDK uses a retry pattern, which will keep trying to get the secret.

Looking at Application Insights for the AppService I can see that the GET request generated by the SDK gets an HTTP 500 response from the Key Vault and a SocketException is thrown, with a result code of ConnectFailure.

Exception

The exception is:

Exception

Looking at the telemetry and stepping through the code there is no element of commonality or obvious cause. It seems to be entirely random.

The bottom line is the Azure hosted AppService sometimes cannot connect to an Azure hosted Key Vault in the same datacentre, using the latest framework and SDK version.

Has anyone else seen this or have any idea? I've searched around and found a few instances of people experiencing the same issue, but nobody has a cause or solution.

EDIT (1): I have now tried spinning up a new Key Vault in a different region entirely, and the problem remains exactly the same.

Ira Rainey
  • 5,173
  • 2
  • 34
  • 42

2 Answers2

4

We experienced the same behavior on our project, where KeyVault would be fast and reliable most of the time, and then intermittently stop responding or take a very long time to return once in a while with no obvious reason to explain why. This occurred in all tiers of our application, from the API, to Azure Functions, to command line tools.

Eventually, we had to work around this by caching secrets in memory to avoid hitting the KeyVault too often, where our AppSettings class would cache these internally. In addition to this, we also configured our DI container to treat this class as a singleton.

Here is a very simplified example:

public class MyAppSettings : IAppSettings
{
    private readonly ObjectCache _cache = MemoryCache.Default;
    private readonly object _lock = new Object();
    private KeyValueClient _kvClient;

    public string MySecretValue => GetSecret("MySecretValue");

    private KeyValueClient GetKeyVaultClient()
    {
        // Initialize _kvClient if required

        return _kvClient;
    }

    private string GetSecret(string name)
    {
        lock (_lock)
        {
            if (_cache.Contains(key))
                return (string) _cache.Get(key);

            // Sanitize name if required, remove reserved chars

            // Construct path
            var path = "...";

            // Get value from KV

            var kvClient = GetKeyVaultClient();
            Task<SecretBundle> task = Task.Run(async() => await kvClient.GetSecretAsync(path));

            var value = task.Result;

            // Cache it
            _cache.Set(name, value, DateTime.UtcNow.AddHours(1));

            return value;
        }
    }
}

This isn't production ready - you'll need to modify this and implement the GetKeyVaultClient method to actually return your KeyVaultClient object, and also the GetSecret method should sanitize the key name being retrieved.

In our DI registry, we had this setup to use a singleton like this:

For<IAppSettings>().Use<MyAppSettings>().Singleton();

These two changes seemed to work well for us, and we haven't had any issues with this for a while now.

Mun
  • 14,098
  • 11
  • 59
  • 83
  • Thanks for this. I did consider doing something along these lines this afternoon, but occasionally I also see it hang on first access when the application starts up. That would still create problems getting the secrets to cache in the first place. Have you ever experienced it on startup? – Ira Rainey Feb 05 '19 at 20:53
  • @IraRainey It's been a while since we had this issue, but I don't recall seeing it on application startup. With that said though, there isn't anything in our application that needs KeyVault on startup. We only need to get something from KeyVault when responding to an API request, service bus message, etc, at which point the application is already running. – Mun Feb 05 '19 at 20:59
  • OK thanks. Actually thinking about it it's less about app startup, but after login, which uses AAD. Once the user is authenticated we need to fetch some secrets. I've seen it fail there, which makes it sometimes look like the login has failed. – Ira Rainey Feb 05 '19 at 21:06
  • We had some on-behalf-of authentication flows failing intermittently in our application. We were pinging the login servers alot to authenticate for a graph api call, and saw some very similar behavior. I was going to suggest a caching approach as well (even though our problem was tangential). Is it possible these service requests are rate-limited? – Scuba Steve Feb 05 '19 at 23:49
  • @ScubaSteve Key Vault is rate limited, but the rate is pretty high, and we're only making a small amount of calls. According to the documentation, if it fails due to rate limits it will return a 429 response code, rather than 500. – Ira Rainey Feb 06 '19 at 06:55
  • My only other thinking is that this is related to the world-wide outages in their authentication pipeline, so it may be transient. Regardless, I think some kind of caching approach might sort you out. How long are the keys valid for? – Scuba Steve Feb 06 '19 at 18:57
  • @ScubaSteve Not sure. I tried it in a different region today and the issue was the same too. I've looked at it and have implemented a different pattern as a kind of work around, where I only read them once, but it bothers me that in theory that could still develop an issue. I'm going to log a support ticket with MS to see what they say as it's clearly a bug. – Ira Rainey Feb 06 '19 at 19:30
  • @IraRainey You could look up the MS github page for the library you're using. Developers are usually watching the issue tracker pretty closely. – Scuba Steve Feb 07 '19 at 19:24
  • Do you guys have anything new about topic? We're experiencing the same issue. Caching is an idea, but what if the initial cache request fails? – Remy Jun 09 '20 at 16:49
0

Another option is to deploy the secrets from keyvault to your app service application as app settings in your deployment pipeline.

Pros:

  • Keep the secrets out of source control
  • Remove the runtime dependency on keyvault
  • Faster reliable local access to the secrets

Cons:

  • Updating the secrets requires a redeploy
Joe Eng
  • 1,072
  • 2
  • 15
  • 30
  • Sorry but isn't the biggest con that the secret is no longer stored securely? Isn't the whole point of keyvault the FIPS security it provides – Jan Martin May 17 '22 at 05:44
  • For func apps to even work you need to set AzureWebJobsStorage (a secret in app settings) to your storage account conn string. Maybe there's better ways of doing it now (managed identity?), but this is pretty much advertising app settings as an appropriate place to store secrets by MS. – Joe Eng May 19 '22 at 01:18