My application is using AppFabric for our distributed caching model in a production web farm of windows web 5 servers. The application is a .net4 c# web application. We are encountering some problems with AppFabric and have some questions regarding the setup of such. The main issue we have is if one of the web 5 servers is restarted, the site on the other servers will also go down for a short period of time with appfabric exceptions like the following appearing in our event logs:
- Message: ErrorCode:SubStatus:There is a temporary failure. Please retry later.
- ErrorCode:SubStatus:Region referred to does not exist. Use CreateRegion API to fix the error.
We have a cache provider wrapper class that creates the datacachefactory object etc and is used as the intermediatory between the web application and appfabric. This is a singleton class so only one instance of the datacachefactory object is created on the Init of the class.
The second error above I believe I have found the reason for, in our code the region was being created on the Init ie at the very start, but if a node comes out of the cluster that contains the region in its memorary, then the above error is a result. To resolve this issue, the region should be attempted to be created on every request appfabric - but only creating it if it does not exist - does this sound correct?
Regarding the other error, I believe it may be down to the configruation. This is the cluster config xml file:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<configSections>
<section name="dataCache" type="Microsoft.ApplicationServer.Caching.DataCacheSection, Microsoft.ApplicationServer.Caching.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" />
</configSections>
<dataCache size="Small">
<caches>
<cache consistency="StrongConsistency" name="App1Cache"
secondaries="1">
<policy>
<eviction type="Lru" />
<expiration defaultTTL="10" isExpirable="true" />
</policy>
</cache>
<cache consistency="StrongConsistency" name="App2Cache"
secondaries="1">
<policy>
<eviction type="Lru" />
<expiration defaultTTL="10" isExpirable="true" />
</policy>
</cache>
<cache consistency="StrongConsistency" name="App3Cache"
secondaries="1">
<policy>
<eviction type="Lru" />
<expiration defaultTTL="10" isExpirable="true" />
</policy>
</cache>
<cache consistency="StrongConsistency" name="default">
<policy>
<eviction type="Lru" />
<expiration defaultTTL="10" isExpirable="true" />
</policy>
</cache>
</caches>
<hosts>
<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
hostId="724664608" size="1228" leadHost="true" account="SERVER1\user"
cacheHostName="AppFabricCachingService" name="SERVER1"
cachePort="22233" />
<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
hostId="598646137" size="1228" leadHost="true" account="SERVER2\user"
cacheHostName="AppFabricCachingService" name="SERVER2"
cachePort="22233" />
<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
hostId="358039700" size="1228" leadHost="true" account="SERVER3\user"
cacheHostName="AppFabricCachingService" name="SERVER3"
cachePort="22233" />
<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
hostId="929915039" size="1228" leadHost="false" account="SERVER4\user"
cacheHostName="AppFabricCachingService" name="SERVER4"
cachePort="22233" />
<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
hostId="1752630351" size="1228" leadHost="false" account="SERVER5\user"
cacheHostName="AppFabricCachingService" name="SERVER5"
cachePort="22233" />
</hosts>
<advancedProperties>
<securityProperties>
<authorization>
<allow users="everyone" />
</authorization>
</securityProperties>
</advancedProperties>
</dataCache>
</configuration>
Note: we have multiple we caches set up as we have multiple applications using appfabric, and seeing same issues with them all.
And this is the web.config entry in the application on each of the servers:
<dataCacheClient requestTimeout="15000" channelOpenTimeout="3000" maxConnectionsToServer="1">
<localCache isEnabled="true" sync="TimeoutBased" ttlValue="300" objectCount="10000" />
<clientNotification pollInterval="300" maxQueueLength="10000" />
<hosts>
<host name="SERVER1" cachePort="22233" />
<host name="SERVER2" cachePort="22233" />
<host name="SERVER3" cachePort="22233" />
<host name="SERVER4" cachePort="22233" />
<host name="SERVER5" cachePort="22233" />
</hosts>
<transportProperties connectionBufferSize="131072" maxBufferPoolSize="268435456" maxBufferSize="8388608" maxOutputDelay="2" channelInitializationTimeout="60000" receiveTimeout="600000" /></dataCacheClient>
Anyone see a problem with the above? As you can see we have 3 lead hosts and 2 secondaries.
Some questions I have following on from this are:
- I have read about having a local cache - what is the technical benefit of this? ie. will this give a local copy of the data per node.
- What is the best practice regarding ports? Are the above ports correct or could there be conflicts with the same ports being used?
- The 3 lead hosts and 2 secondaries, is this a recommended split? Does it mean there are 3 copies of the data?
When we are restarting the servers, we attempt to never restart the lead hosts at the same time.
Thanks for any feedback on this!