-2

Had some issue with one of the four azure cloud service deployments in the NA region. All four deployments are hosted behind a traffic manager.

deployment/endpoint with issues was also getting more traffic than other endpoints/deployment. following changes were made to a traffic manager

  1. so we first reduced it's weight to 10 while the other 3 had a weight of 25. In spite of this, the faulty endpoint was getting more traffic than others, though the traffic reduced a bit
  2. then we reduced it's weight to 1 while the other had weight as 25, still, it was getting more traffic, though the traffic reduced again
  3. finally we disabled the faulty endpoint in the traffic manager, but still, it continued to get traffic for next 6+ hours
Minhaj Patel
  • 579
  • 1
  • 6
  • 21
Saby
  • 99
  • 1
  • 7

1 Answers1

0

Sometimes, the clients' local caching name server always queries the fixed endpoint in a specific time since there is DNS cache locally. You can run ipconfig/flushdns in a CMD as an administrator account to clear the cache. Then verify if the faulty endpoint continues.

You could also change the DNS time to live (TTL) to a value which best balances your need. Shorter values result in faster cache expiry and thus more round-trips to the Traffic Manager name servers. Longer values mean that it can take longer to direct traffic away from a failed endpoint. Refer to How Traffic Manager Works.

Moreover, the Traffic Manager does not receive DNS queries directly from clients. Rather, DNS queries come from the recursive DNS service that the clients are configured to use. For each DNS query received, Traffic Manager randomly chooses an ONLINE endpoint.

It is important to understand that DNS responses are cached by clients and by the recursive DNS servers that the clients use to resolve DNS names. This caching can have an impact on weighted traffic distributions. When the number of clients and recursive DNS servers is large, traffic distribution works as expected. However, when the number of clients or recursive DNS servers is small, caching can significantly skew the traffic distribution.

Ref: Weighted traffic-routing method

Update

Another thing might be that the TM health probe time. The number of Traffic Manager health checks reaching your endpoint depends on the monitoring interval and the number of locations from where the health checks originate. But this usually happens in a short time nearly few seconds.

Cloud Service 'staging' slots can be configured in Traffic Manager as External endpoints. Because the External endpoint type is in use, changes to the underlying service are not picked up automatically. With external endpoints, Traffic Manager cannot detect when the Cloud Service is stopped or deleted.

You can go through FAQs to get more details.

Nancy
  • 26,865
  • 3
  • 18
  • 34
  • Thanks for your reply. Even i am of the opinion that DNS caching plays a role here. Doe that mean performance based traffic manager doesn't work in real time? What do I do when I have to divert traffic from one of my deployments/endpoints suddenly? – Saby Sep 17 '18 at 18:38
  • I just test TM with two web app service in different locations based on weighted traffic-routing method. It works fine in a few seconds after disabling one endpoint. Also, check my update. Can you change to performance-based traffic manager? Is the same test result on your side? – Nancy Sep 18 '18 at 11:04
  • You also could [verify Traffic Manager settings](https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-testing-settings) again. Please let me know if I can help further or if this helps you, you can accept it for other references. – Nancy Sep 19 '18 at 01:57
  • Thanks for your reply. I will cross check on this and will get back – Saby Sep 19 '18 at 17:54
  • @Saby Please let me know if you need more info in this case or you can accept it as an answer so that others who find the same scenario might help. – Nancy Sep 24 '18 at 09:55
  • I don't see any option of accepting your answer – Saby Sep 28 '18 at 14:00
  • @Saby Ah, you can check mark in the left of my initial reply, or under the "answer". – Nancy Sep 29 '18 at 00:14
  • Thanks for pointing out. I have marked the answer as accepted :) – Saby Sep 30 '18 at 16:43