I have designed systems that direct query volume in a similar scenario. It may be interesting to include several more things:
- average response time for real traffic to the candidate (not just a monitoring query)
- number of queries in the last time period (60s, etc)
- memory/cpu/disk utilization over some previous time period
I've previously given each of the resources a weight and then basically added them up. So from a single server you might get back:
memory 50(%)
cpu 40(%)
disk 4000 (iops, if you know the limit here making it a % is good)
ms 300 (msecs average response time)
this server's weight would be 4390(higher would be worse here). You can see here where maybe if CPU is less of a concern you can change its 'weight' in the calculation to make the decision of which client to use more accurate for your environment.
How you collect this can make a difference with how frequently it can be collected, and how reliable it will be (maybe a node has died since you made the list of least used servers). One approach is to run a reporting daemon on each candidate and query it when you get a client request, maybe via multicast. The reporting daemon can collect stats very frequently to make the decision information as accurate as possible.
It isn't clear how transient the config you are generating is, which is an important consideration when doing the distribution. Will you have clients connected for long periods of time? Is it possible that you need to disconnect and re-distribute clients because a server got overloaded? Perhaps something you've already considered.
Depending on how transient your that is and how much you know about the queries you could also stand to add some more data to the decision metrics:
- expected client weight currently being served by candidate (if you give clients weights too)
- data set already in memory (if your data size exceeds the memory capacity of the server and you have more than a couple servers you can improve your RAM utilization by balancing queries for specific data sets to servers that already have them in memory)
- uptime of the server (a completely unloaded new box will usually get crushed in weight based scenarios where the decisions are made frequently)
Hopefully that helps! It is an interesting problem.