3

I have a caching-only dns server which get ~3k queries per second. Here is specs:

Xeon dual-core 2,8GHz 4GB of RAM
Centos 5x (kernel 2.6.18-164.15.1.el5PAE) 
bind 9.4.2

rndc status: recursive clients: 666/4900/5000

About 300 new queries (not in cache) per second.

Bind always uses 100% on one core on single-thread config. After I recompiled it to multi-thread, it uses nearly 200% on two core :( No iowait, only sys and user. I searched around but didn't see any info about how bind use CPU. Why does it become bottleneck?

One more thing, here is RAM usage:

cat /proc/meminfo 
MemTotal:      4147876 kB
MemFree:       1863972 kB
Buffers:        143632 kB
Cached:         372792 kB
SwapCached:          0 kB
Active:        1916804 kB
Inactive:       276056 kB

I've set max-cache-size to 0 to make sure bind can use as much RAM as it want, but it always stop at ~2GB. Since every second we got not cached queries so theoretically RAM must be exhausted but it wasn't.

Do you have any idea?

TIA,

-Gk

Gk.
  • 728
  • 12
  • 20

4 Answers4

2

Which version of BIND are you using? Versions before Bind 9.5 have known scalability problems with high loads, see https://www.dns-oarc.net/files/dnsops-2007/Graff-BIND9-cache.pdf .

Besides:

  • never set max-cache-size to 0 unless you want to open your server to DoS
  • the maximum size taken by your cache is always bound to the TTLs of the actual records

I recommend you perform a side test with dnscache from dnscache, it takes 10 minutes to install, is extremely simple to tune and maintain, and has predictable performance.

michele
  • 585
  • 3
  • 7
  • The question is 2 years old and someone else already told him to update years ago... – Chris S May 26 '12 at 02:38
  • 1
    The age is of little relevance if the question is still applicable. Regarding the answer, previous folks told him to upgrade generally. My answer points to the specific problem inside the version and why it affects performance. – michele May 27 '12 at 05:28
0

Interesting problem... Never seen bind use 100% CPU, but quick search turned out a very interesting page that may help you fix the problem... Let me know how it turns out. I am interested to know the outcome.

solefald
  • 2,301
  • 15
  • 14
  • Thanks for the link. I read it several times but because of our complex current setup, I'd like to save the multi-instance of bind as the last resort. – Gk. Apr 20 '10 at 16:49
  • Gk, as far as your 2Gb memory limit does, it is because your `bind` compiled as a 32-bit application. If hardware allows, you need to recompile it as 64-bit to be able to take advantage of the extra memory, however, `bind` has a hard-coded limit of 4Gb, so if you ever want to go over that, you will have to hack the source. – solefald Apr 20 '10 at 17:03
0

3k qps for a server of that class is relatively low volume in raw I/O and memory bandwidth terms - I'd expect to be able to get nearer 20k if it was an authoritative server.

That said, BIND 9.4.2 is old. If you can roll your own or use non-RHEL RPMs you really should try BIND 9.7.x instead and see if that solves your performance issues.

Also, to use more than 2GB of RAM you'd need to be running on x64 in 64-bit mode rather than x86.

Alnitak
  • 21,191
  • 3
  • 52
  • 82
  • Thanks for your answer. I got some random crash when update to bind 9.6.x but I'll give it a try on 9.7.x ver. I'll post this question on bind-user maillist and update here as soon as I get the answer. – Gk. Apr 26 '10 at 02:20
0

You will probably get much better performance with Unbound. If you are using BIND only as a caching recursive server with nothing special in the configuration, switching to Unbound will be really easy.

snap
  • 1,251
  • 10
  • 18