2

We have 2 memcached servers configured and use the Eniym client. When one of the server is down, it appears this server is added to the deadServers list (ServerPool.cs) and tries to resurrect the server every 10seconds (we have configured deadTimeOut to be 10seconds). Attempting to connect to the failed server causes a TCP timeout, the pages take a long time to load which results in bad user experience.

1) What is the standard way of resolving this issue? There are some posts about removing the server from the deadServers list. Is it okay to do this?

2) What is the recommended deadTimeOut setting (I understand by default it's 2 mins and we've changed it to 10seconds in our implementation)

3) Am I correct in my understanding that the cached data is not replicated across Server 1 and Server 2? If Server 1 is down, then it goes to the database to fetch these values (and it doesn't really check Server2)?

Any help is really appreciated.

user25164
  • 421
  • 2
  • 5
  • 17

1 Answers1

1
  1. As a general rule, it's normally expected that you just accept that the cache may or may not have what you want.
    • It depends on the scenario, but it sounds like you might benefit from a higher one. There's no great loss having it higher (2-5 minutes).
    • Yes. Memcache will usually cache the values again on Server 2 (after fetching from the DB because Server 1's cache is unavailable).

You probably also lower your TCP timeout being used to reconnect to the possibly-dead server.

mibus
  • 816
  • 4
  • 5
  • In larger farms, how is this configured? When one of the memcache server goes down, I think it's unacceptable to wait for a TCP timeout which may easily take around 20-30 seconds and user's experience would be degraded till that server is back up. "You probably also lower your TCP timeout " -> Is this the deadTimeOut setting? – user25164 Nov 09 '09 at 22:31
  • It depends on the features of the API you're using, but at least some let you specify how long you want to wait for results, before assuming the server has died. eg. in PHP, the "timeout" option to the connect method: http://www.php.net/manual/en/function.memcache-connect.php Use a lower timeout there to detect the failure quickly. Remember though that it'll only be an issue once every {deadTimeOut} seconds, inbetween it'll remember that the server is offline. You should read the FAQ: http://code.google.com/p/memcached/wiki/FAQ#How_does_memcached_handle_failover? – mibus Nov 10 '09 at 20:21
  • Possibly also: http://code.google.com/p/memcached/wiki/FAQ#What_is_a_%22consistent_hashing%22_client? – mibus Nov 10 '09 at 20:22