This also might be about a problem of BIND9 DNS server. I found that sometimes the number of queries per second of the server suddenly became 0, and then this lasts a few seconds, usually over 10 seconds. However, the number of incoming DNS query requests never drops below 500 per second.
I added some debug logs to BIND9 src, and ran again, then I noticed the thread would wait for a few seconds before recvmsg() returns, periodically. While it hangs, the Recv-Q becomes full. The version is BIND-9.9.6, but unfortunately when I switch to BIND- 9.9.9-P4, the problem is still there.
I then tried another DNS server written in Golang, I noticed that the problem never came. The query rate keeps stable as it should be.
I think the ReadFromUDP() function of Golang that server used just wraps the syscall recvfrom(), and the recvfrom() there wouldn't block, either. But ReadFromUDP() seems to keep trying recvfrom(), while BIND9 is using epoll. When data's successfully read, ReadFromUDP() will then return. I'm not sure whether there is actually any difference between those two cases.
I'm using CentOs 6.4, and the kernel version is 2.6.32-358-el6.x86_64.
Has anyone ever encountered such kind of problem?