3

I have a web app in production (.Net Core), I deployed it in Azure as App service which is in premium tier p2v2 4 instances. I am also using Azure Redis cache (Premium Tier) which my app is using it as cache. I have two app services (primary and secondary) configured Traffic Manager for load balancing.

Whenever I am trying to deploy my app into production using swap slot feature, Both the app service response time goes up to 20 secs and it is down for around 1 minute and my CPU utilization goes close to 90%. And I am seeing multiple exceptions from Redis client (For ex: No connection is available to service this operation: EVAL; It was not possible to connect to the Redis server(s). To create a disconnected multiplexer, disable AbortOnConnectFail. ConnectTimeout; IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=452,Free=32315,Min=8,Max=32767), Local-CPU: n/a) and my HttpQueue length goes above 10

I can infer from the above image is that worker thread has been overloaded, Donno why it is happening

I am using .Net StackExchange Redis client version 2.0.601, recently did an update from version 1.2.4

Note: I didn't use slot specific app setting. It keeps happening for every swap slots during deployment I didn't find any app service restart in the logs.

I want to know any of you guys are facing this issue, if yes please suggest me where is the problem or how to debug and it would also better if you can share any of things you tried.

I tried to find any error logs in AZure Redis cache server but couldn't find any.

I am trying to figure out what is causing this issue, how to debug this kind of issues with azure, and whether anybody encountered the same and have implemented any resolution for the same?

Please let me know if you need any additional details.

Sarva Raghavan
  • 88
  • 1
  • 1
  • 11

1 Answers1

1

Here is something which might be worth trying :

Cache metrics are reported using several reporting intervals, including Past hour, Today, Past week, and Custom. The Metric blade for each metrics chart displays the average, minimum, and maximum values for each metric in the chart, and some metrics display a total for the reporting interval.

Each metric includes two versions. One metric measures performance for the entire cache, and for caches that use clustering, a second version of the metric that includes (Shard 0-9) in the name measures performance for a single shard in a cache. For example if a cache has 4 shards, Cache Hits is the total amount of hits for the entire cache, and Cache Hits (Shard 3) is just the hits for that shard of the cache.

Try looking for the Error metric while monitoring.

https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-monitor#available-metrics-and-reporting-intervals

enter image description here

Additionally , we need to retry for TimeoutException, RedisConnectionException or SocketException even which ensure it will try to connect in case of any exception, you can read about all the best practises arouns Redis Cache usage in below doc:

https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices

https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices#when-is-it-safe-to-retry

Hope it helps.

Mohit Verma
  • 5,140
  • 2
  • 12
  • 27