The Scenario:
Have written a back end service that periodically checks for new rows in database and inserts it into AppFabric Cache. We have been using this approach since 2 odd months.
We are using a server machine to store data into cache in different regions. We are not using any Cluster of machines. Its only a single machine. The "default" cache region has been divided into 3 regions for 3 different environments. Each environment access this machine for cached data from its specified cache region.
It was working fine until the following is happening since few days. We are getting the following exception.
ErrorCode: ERRCA0016 :SubStatus ES0001 : The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown.
The subsequent access to cache throws the following exception: ErrorCode ERRCA0017 : SubStatus ES0001: There is a temporary failure. Please retry later.
After 4 to 5 try, we get the following exception. ErrorCode ERRCA0018 : SubStatus ES0001 : The request timed out.
After all this, access to cache throws following exception: ErrorCode ERRCA0005 : SubStatus ES0001 : Region referred to does not exist. Use CreateRegion API to fix the error.
Looking into the first error ErrorCode: ERRCA0016 :SubStatus ES0001 : Checked if the serialized object to be stored is of greater size than max buffer size. But prior to this larger size object were kept into the cache. Seems this can not be the problem.
What can be the exact problem that can be occurring ?
EDITED: Did view the logs of event logger for Windows AppFabirc Cache. This is what we found upon our diggings. These are some of the frequent error logs obtained.
Source : AppFabricCachingService.Failfast
Param : Lease with external store expired: Microsoft.Fabric.Federation.ExternalRingStateStoreException: Lease already expired at Microsoft.Fabric.Data.ExternalStoreAuthority.UpdateNode(NodeInfo nodeInfo, TimeSpan timeout) at Microsoft.Fabric.Federation.SiteNode.PerformExternalRingStateStoreOperations(Boolean& canFormRing, Boolean isInsert, Boolean isJoining) General : AppFabric Caching service crashed. Lease with external store expired: Microsoft.Fabric.Federation.ExternalRingStateStoreException: Lease already expired at Microsoft.Fabric.Data.ExternalStoreAuthority.UpdateNode(NodeInfo nodeInfo, TimeSpan timeout) at Microsoft.Fabric.Federation.SiteNode.PerformExternalRingStateStoreOperations(Boolean& canFormRing, Boolean isInsert, Boolean isJoining)}
Source: AppFabricCachingService.Crash
Param :
System.Runtime.CallbackException: Async Callback threw an exception. ---> System.IdentityModel.Tokens.SecurityTokenValidationException: The service does not allow you to log on anonymously. at ......
Probable Causes
Upon Searching for the above event log errors, found that this could be caused due to the following problem. The Cache server is a different server and SQL configuration was used for the same whose database was on different server. So while getting cache configurations from SQL database there would be some failure in creating connection between the cache server and database server. So, we moved the cache configurations from SQL to XML. But still we got the error.
ErrorCode ERRCA0017 : SubStatus ES0001: There is a temporary failure. Please retry later.
ErrorCode ERRCA0005 : SubStatus ES0001 : Region referred to does not exist. Use CreateRegion API to fix the error.
Upon some more digging up, we are guessing that the problem could be this. Whenever a machine that does not has grant access permissions to access AppFabric Cache, then it tries for some number of attempts and then appfabric stops working. After granting access to cache using Grant powershell commands, the machine is now able to access the cache. Will have to monitor for a few days.
Could this a valid reason ?