I have a NFS4 server running Ubuntu 10.04 Server, on an enterprise environment with 50+ clients.
Overall, everything works fine, but from time to time, some client starts making +1000 NFS operations per second, usually because of a client software fault or poorly written software.
When this happens, usually the clients start getting a lot of messages like this one on dmesg:
[443947.760016] nfs: server 192.1.1.111 not responding, still trying
[443952.696017] nfs: server 192.1.1.111 not responding, still trying
[443954.056079] nfs: server 192.1.1.111 OK
[443954.056311] nfs: server 192.1.1.111 OK
My NFS4 server launches 96 daemons. It's running on a 8-core multithread CPU (total 16 threads), 16GB RAM memory. Maybe 96 is not a good choice?
I've developed some tools to plot NFS clients usage so I can detect problems and kill the clients manually. Of course, I also could automatice this. However, I don't want to be so aggressive. The most of the time, it's not their fault, and I don't want the clients to go mad because I screwed their 2-weeks simulation because gnome-settings-daemon is going funny.
So, before getting through this not-so-good path, are there any good practices or established mechanisms for preventing NFS Denial of Service?