When configuring a Citrix Netscaler for use with Exchange 2010, what should the persistence timeout be?

Question

I have a Netscaler that I'm using to front end 4 Exchange 2010 CAS servers. The clients that use this virtual server include

Outlook Web Access (OWA)
Exchange Web Services (Outlook for Mac 2011)
Outlook Anywhere (aka RPC/HTTPS)
Activesync

The special thing about Outlook Anywhere is that Outlook 2010 SP2 and newer will use cookies to maintain state. Older versions of Outlook don't use cookies and fail back on maintaining state by IP.

This is my current configuration:

enter image description here

Issues

If I reboot a single CAS server in the array, the clients never fully recover and experience connection delays and issues until all CAS servers are reset at the same time. (todo: attempt to clear the "session table" on the Netscaler instead of rebooting the CAS servers at the same time)
When I first looked at this server, the Cookie Insert persistence was set to Zero AKA infinity. This means that cookie-based sessions were never load balanced based on 'Least Connection`. 3 CAS servers had 100% CPU while one CAS had almost nothing.
I decreased the cookie insert from 0 to 5 (read from infinity to 5 minutes) and users were getting kicked out of OWA even when composing a new email (a bad thing since the message was lost)
Some RPC/HTTPS users were getting sporadic disconnects (not sure if it was 2010 SP2 or newer, but I suspect it was) (cached mode or non-cached)
Many of our users are behind a NAT pool, meaning that anyone who falls behind the same SourceIP share the same CAS server. If that user population is 100+ users, that means every time the session times out, the users are moved to a new CAS server (shuffling around a huge load)
If two large user populations behind a NAT land on the same CAS server, then that CAS server will get overloaded and reach 100% CPU
I think it's possible that if a NAT'd population of users have a mixture of cookie (OWA/ O2010 SP2) or non cookie (Older than O2010 SP1) that the netscaler may conflate the two load balancing techniques and mess up.
Keeping the last bullet point in mind, and the fact that when I reboot a single CAS server Outlook Anywhere / HTTP acts weird until I reboot all CAS servers, I think that the act of rebooting splits Outlook Anywhere calls among different CAS servers and persistence is all screwed up. (Perhaps Outlook 2010 doesn't work correctly with cookies for all RPC calls such as FreeBusy / Calendar lookups / Normal Outlook traffic)

What the docs say

The official Citrix Exchange configuration guide says to configure everything as shown in the image below, however cookie persistence is set to 2 minutes. As mentioned earlier, this doesn't work for OWA, or Outlook Anywhere. (OWA messages are lost, etc)

Question

What should persistence be set to?
When does the persistence timer get reset for OWA, RPC/HTTPS, and Activesync? (how do each of those go "idle")
Is it possible that Outlook 2010SP2 isn't perfectly cookie aware and some HTTP calls don't use a cookie and uses a backup persistence?
How does the Activesync Direct Push interact with the persistence setting defined below? An HTTP "long poll" might be seen as idle, wherein the client says "Give me /Activesync/DirectPush " via a POST ... and the server (by direct push design) accepts the connection and doesn't send data until a new message arrives. That means that data is being transferred at a TCP level (sync/ack) but not a HTTP level.
To clarify my earlier bullet, what counts as "traffic" for persistence? Is it TCP/layer 4 or HTTP layer 5? Does SSL data count?

I've called Netscaler support about this issue, and they told me the preferred settings (which don't work). I'll also open a ticket and refer them to this SO post.

Note: This configuration detail might matter when discussing this issue: I am SSL offloading at the Netscaler

I'm running Exchange 2010 SP2 RU4 Release 2 everywhere

score 2 · Answer 1 · answered Oct 06 '15 at 21:00

Don't claim to have all of the answers (haven't support Exchange in many years), but...

Not sure what guidance you used, but the Citrix NetScaler Deployment Guide for Microsoft Exchange 2010 suggests a timeout of 180, or 3 hours. While the diagram on page 15 is misleading (shows default value), the summary table on page 21 contains the recommendation.
Regarding your timer questions, read the background section of CTX108883, but the short answer is "The expiry time is client software implementation dependent, and usually such cookies expire when the software is properly closed."
Is there a reason why you're running HTTP Cookie Version 0 and not Version 1?

Never heard of Cookie version 1 vs 0... I'll need to learn more — makerofthings7, Oct 06 '15 at 22:02

When configuring a Citrix Netscaler for use with Exchange 2010, what should the persistence timeout be?

1 Answers1