11

Does anyone happen to know what the theoretical size limit of /etc/hosts is on a Linux system before you might start to see degradation in performance?

Furthermore, can anyone point me towards some official source that states what the expected limit is?

Desperatuss0ccus
  • 252
  • 1
  • 4
  • 9
MikeP90
  • 113
  • 1
  • 4
  • 8
    This makes me think you're doing something crazy or WAY outside of best-practices. What are the details? – ewwhite Feb 24 '16 at 20:48
  • 3
    Sure seems like deploying a lightweight DNS resolver might be a better solution here. – Zoredache Feb 25 '16 at 00:42
  • 1
    I have a customer who's requesting this. I was hoping to find some documentation that I could show them why this will cause issues; instead of having to try it on a test machine and demonstrate it. – MikeP90 Feb 25 '16 at 13:25
  • 1
    The hosts file is a relic of the pre-DNS days of the 1970s and early 1980s. Having hundreds of entries in a hosts file was recognized as a bad idea _that far back_. If you've got more than 10 entries in yours, you're probably on the wrong track. – Michael Hampton Feb 25 '16 at 23:56

3 Answers3

9

Use the source, Mike.

The resolver uses a linear search through the text file to locate entries. It's a database with no indexes. So, in the absence of additonal caching capability, the cost for lookups will be O(n). As to when that will result in a degradation in performance, thats an impossible question to answer - it gets slower with every record.

If you talk to a database programmer or admin you'll get different figures for the point at which an index lookup (O(log2(n)) is cheaper than a full table scan, but generally the answer will be in the region of 20 to 100 records.

Any linux system needing to resolve a lot of names (not just hostnames). Should be running nscd or similar. Most such caches will index data themselves which would nullify the performance question, however...

It provides no means for managing complex/large datasets - if you have a host with more than one IP address, lookups via the hosts file will always return the first entry.

symcbean
  • 21,009
  • 1
  • 31
  • 52
  • 1
    To close the loop, we added 1.7 million records to the hosts file and have estimated that it added .5 seconds to each lookup. In this environment, .5 seconds is negligible. I think a DNS server is still a better solution, but the customer wants what the customer wants. – MikeP90 Apr 06 '16 at 18:02
5

A bit of Internet history -- before DNS was deployed in 1984, hosts file was the only was to resolve names, and there were not a lot of hosts on the network -- 325 in February 1983 (RFC 847). There are copies of HOSTS.TXT (not machine readable, though) from 1982 in the archive of internet-history maillist. There was even an alternate HOSTS.TXT (Geoff Goodfellow's).

sendmoreinfo
  • 1,772
  • 13
  • 34
3

Technically, there's no upper-bound. However, every DNS lookup is going to hit this file, so why leave yourself open to that?

For what it's worth, the largest /etc/hosts file I've distributed in my environments was 1,200 lines. And that worked well for the application I was managing. DNS was not an option in that particular environment.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • Let's put it another way. If there's no indexing in the kernel, each hit would do a linear search which will depend on cache size as far as timing goes. – Deer Hunter Feb 25 '16 at 02:09
  • 4
    I use a popular hosts file found on the internet, there are 15,430 lines and I notice no real degradation in web surfing performance. – Bert Feb 25 '16 at 21:37
  • @DeerHunter I don't think there's anything in the Unix kernel that performs hostname lookup. – Barmar Mar 01 '16 at 17:54
  • +1 to Bert's note. I just used a custom file with 22,000 lines and it has not impacted performance. This is useful for testing purposes! – Josh koenig May 17 '17 at 17:38