I have multiple cache servers which are added to a virtual pool at the PHP layer via the Memcached::addServers() api. During fault tolerance testing I noticed that taking one of the memcached servers offline caused the response times of the application to increase to 3-6 seconds per request, when normally they take .5 - 2 seconds per request.
I've implemented these settings per the advice of this blog post:
$memcached = new Memcached();
$memcached->setOption(Memcached::OPT_CONNECT_TIMEOUT, 10);
$memcached->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
$memcached->setOption(Memcached::OPT_REMOVE_FAILED_SERVERS, true);
$memcached->setOption(Memcached::OPT_RETRY_TIMEOUT, 1);
$memcached->addServers($servers);
However, these settings did not appear to resolve the increased lag issue. The only way to resolve the issue is to reintroduce the server back into the pool, then the lag disappears. Obviously this is not an ideal solution, as it may take us 15 minutes to resolve the issue on production (i.e. the dreaded 3am page).
In researching this issue, I did come across this post, which discusses the use of moxi, and sounds interesting, but before I go an introduce yet another layer into our application I wanted to know how others resolved these lag issues?