We have multiple web servers behind a Netscaler load balancer. The (Win2008) servers each host an ASP.Net application (using iis 6.1) that are configured to use a central server running the .Net StateServer service for session sate management.
When we configure the load balancer to use "round robin" or "least busy" routing, the web application regularly crashes with an error indicating that something it expected to be in the session state was missing. However, it doesn't do it ALL the time - only on certain steps and then on about 75% of the attempts.
When we configure the load balancer to server persistence (so the user "sticks" on one server) the issue does not occur. (But this is not our desired running mode)
Things we have checked/done already:
- Restart all the things
- The machine keys are the same on all servers
- There are no connectivity issues between the web servers and the state server
- IIS is site names, paths and id's are the same on all web servers
- No error is logged in the application logs
Does anyone have any suggestions on other things to check / possible causes?
NB. We have exactly the same set up in another environment - same type of load balancers, same web app, same config, same server setup... and it works fine. The only difference is a different version of VMWare Tools but can't see this being the cause?