I generally store my list of available servers in APC, so I can modify it on the fly. You're correct in that systems will attempt to continue using the down server while it's listed, luckily with the new hashing methods it's not a big deal to pull it from rotation.
I would avoid using a brand new PHP extension, or trying to add new software to your deployment stack. You're likely already using something for monitoring (nagios?). Having it invoke a simple PHP script on each of your webservers to tweak the in-memory list seems like the best bet.
It's worth noting that under the Ketama hashing system, removing a server from rotation will result in its keys being re-hashed elsewhere on the ring (continuum), other servers will not see their keys assigned elsewhere. Visualize it as a circle, each server is assigned multiple points on the circle (100-200). Keys are hashed to the circle and continue clockwise until they find a server. Removing a server from the ring only results on those values continuing a bit further to find a new server. With luck the distribution of values will hit the remaining servers equally.
Demonstrating the hashing system:
<?php
$m = new Memcached();
$m->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
$m->addServer('localhost', 11211);
$m->addServer('localhost', 11212);
$m->addServer('localhost', 11213);
$m->addServer('localhost', 11214);
$m->addServer('localhost', 11215);
$m->addServer('localhost', 11216);
$m->addServer('localhost', 11217);
$m->addServer('localhost', 11218);
$m->addServer('localhost', 11219);
$m->addServer('localhost', 11210);
$key = uniqid(); //You may change this to md5(uniqid()); if you'd like to see a greater variation in keys. I don't think it necessary.
$m->set($key, $key, 5);
var_dump($m->get($key));
unset($m);
$m = new Memcached();
$m->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
//one server removed. If assignment to the continuum is dependent based on add order, we would expect the get call here to fail 90% of the time, as there will only be a success if the value was stored on the first server. If the assignment is based on some hash of the server details we'd expect success 90% of the time.
$m->addServer('localhost', 11211);
//$m->addServer('localhost', 11212);
$m->addServer('localhost', 11213);
$m->addServer('localhost', 11214);
$m->addServer('localhost', 11215);
$m->addServer('localhost', 11216);
$m->addServer('localhost', 11217);
$m->addServer('localhost', 11218);
$m->addServer('localhost', 11219);
$m->addServer('localhost', 11210);
var_dump($m->get($key));
unset($m);
$m = new Memcached();
$m->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
//2 servers removed
$m->addServer('localhost', 11211);
$m->addServer('localhost', 11212);
//$m->addServer('localhost', 11213);
//$m->addServer('localhost', 11214);
$m->addServer('localhost', 11215);
$m->addServer('localhost', 11216);
$m->addServer('localhost', 11217);
$m->addServer('localhost', 11218);
$m->addServer('localhost', 11219);
$m->addServer('localhost', 11210);
var_dump($m->get($key));
unset($m);
$m = new Memcached();
$m->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
//Out of order
$m->addServer('localhost', 11210);
$m->addServer('localhost', 11211);
$m->addServer('localhost', 11219);
$m->addServer('localhost', 11212);
$m->addServer('localhost', 11217);
$m->addServer('localhost', 11214);
$m->addServer('localhost', 11215);
$m->addServer('localhost', 11216);
$m->addServer('localhost', 11218);
$m->addServer('localhost', 11219);
$m->addServer('localhost', 11213);
var_dump($m->get($key));
unset($m);
If the hashing system cares about order, or omitted servers we would expect to get bool(false)
on most of the secondary examples, since an early server was removed etc. However based on my quick, completely non-scientific tests, I only get a bool false in any particular slot one time in 10. I clearly just launched 10 servers on my test box. Giving each of them only 4mb of ram