I have a Netscaler that I'm using to front end 4 Exchange 2010 CAS servers. The clients that use this virtual server include
- Outlook Web Access (OWA)
- Exchange Web Services (Outlook for Mac 2011)
- Outlook Anywhere (aka RPC/HTTPS)
- Activesync
The special thing about Outlook Anywhere is that Outlook 2010 SP2 and newer will use cookies to maintain state. Older versions of Outlook don't use cookies and fail back on maintaining state by IP.
This is my current configuration:
Issues
If I reboot a single CAS server in the array, the clients never fully recover and experience connection delays and issues until all CAS servers are reset at the same time. (todo: attempt to clear the "session table" on the Netscaler instead of rebooting the CAS servers at the same time)
When I first looked at this server, the
Cookie Insert
persistence was set toZero
AKA infinity. This means that cookie-based sessions were never load balanced based on 'Least Connection`. 3 CAS servers had 100% CPU while one CAS had almost nothing.I decreased the cookie insert from
0
to5
(read from infinity to 5 minutes) and users were getting kicked out of OWA even when composing a new email (a bad thing since the message was lost)Some RPC/HTTPS users were getting sporadic disconnects (not sure if it was 2010 SP2 or newer, but I suspect it was) (cached mode or non-cached)
Many of our users are behind a NAT pool, meaning that anyone who falls behind the same
SourceIP
share the same CAS server. If that user population is 100+ users, that means every time the session times out, the users are moved to a new CAS server (shuffling around a huge load)If two large user populations behind a NAT land on the same CAS server, then that CAS server will get overloaded and reach 100% CPU
I think it's possible that if a NAT'd population of users have a mixture of cookie (OWA/ O2010 SP2) or non cookie (Older than O2010 SP1) that the netscaler may conflate the two load balancing techniques and mess up.
Keeping the last bullet point in mind, and the fact that when I reboot a single CAS server Outlook Anywhere / HTTP acts weird until I reboot all CAS servers, I think that the act of rebooting splits Outlook Anywhere calls among different CAS servers and persistence is all screwed up. (Perhaps Outlook 2010 doesn't work correctly with cookies for all RPC calls such as FreeBusy / Calendar lookups / Normal Outlook traffic)
What the docs say
The official Citrix Exchange configuration guide says to configure everything as shown in the image below, however cookie persistence is set to 2 minutes. As mentioned earlier, this doesn't work for OWA, or Outlook Anywhere. (OWA messages are lost, etc)
Question
What should persistence be set to?
When does the persistence timer get reset for OWA, RPC/HTTPS, and Activesync? (how do each of those go "idle")
Is it possible that Outlook 2010SP2 isn't perfectly cookie aware and some HTTP calls don't use a cookie and uses a backup persistence?
How does the Activesync Direct Push interact with the persistence setting defined below? An HTTP "long poll" might be seen as idle, wherein the client says "Give me
/Activesync/DirectPush
" via a POST ... and the server (by direct push design) accepts the connection and doesn't send data until a new message arrives. That means that data is being transferred at a TCP level (sync/ack) but not a HTTP level.To clarify my earlier bullet, what counts as "traffic" for persistence? Is it TCP/layer 4 or HTTP layer 5? Does SSL data count?
I've called Netscaler support about this issue, and they told me the preferred settings (which don't work). I'll also open a ticket and refer them to this SO post.
Note: This configuration detail might matter when discussing this issue: I am SSL offloading at the Netscaler
I'm running Exchange 2010 SP2 RU4 Release 2 everywhere