5

What the Windows 'hosts' file encoding is? Is it UTF-8? Or ASCII + system codepage? How IDN (international domain names with umlauts etc.) entries should be added and can they be added at all?

deceze
  • 510,633
  • 85
  • 743
  • 889
noober
  • 4,819
  • 12
  • 49
  • 85
  • What's provoking the question? – Bill_Stewart Aug 22 '14 at 14:39
  • I'm writing a GUI editor for this file. Of course, very special one. And now I'm looking for the right encoding for serialization purpose. – noober Aug 22 '14 at 14:44
  • IDNs should probably be stored in their canonical Punycode encoded form, the rest is then just ASCII; hence the question is pretty moot. – deceze Aug 22 '14 at 14:49
  • Malware commonly wants to edit the hosts file. Not saying your app is malware, but many anti-malware apps will probably think it is. – Bill_Stewart Aug 22 '14 at 15:49
  • 1
    It's an internal GUI tool for our company sysadmins. I hope they know what they're doing. – noober Aug 22 '14 at 16:01

2 Answers2

3

It should be ANSI or UTF-8 without BOM. I just dealt with a server that had the hosts file encoding set to UCS-2 Little Endian, and that led to the file being ignored.

There is a wealth of information here: https://serverfault.com/questions/452268/hosts-file-ignored-how-to-troubleshoot

Community
  • 1
  • 1
nealibob
  • 119
  • 8
1

The simple answer is ANSI or UTF-8 WITH BOM.

(UTF-8 without BOM is NOT valid).


Details:

As far as I have tried, the encoding of the hosts file on Windows should be
ANSI or UTF-8 with BOM.

I know this question is many years old, but a colleague made the mistake of looking at this post and the ServerFault post, so I decided to add an answer.

1. Simple case only ASCII

Works.

Simple case

Without any multi-byte characters, This is equivalent to ANSI, also equivalent to UTF-8 without BOM.

2. ANSI (with Japanese ANSI multi-byte characters)

Works.

ANSI

note: There are Japanese characters but this is valid ANSI encoding in windows.

In Japanese editions of Windows, this code page cp932 is referred to as "ANSI",

https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)

3. UTF-8 with BOM

Works.

UTF8 with BOM

note: BOM 付き means with BOM.

4. UTF-8 without BOM

DOES NOT work.

UTF8 without BOM does not work

5. Additional test cases

If you use emoji instead of Japanese, the result will be the same.

Use emoji and save as UTF8 without BOM does not work. (However, other lines not include emoji may be worked correctly.)

emoji and without BOM does not work

Use emoji and save as UTF8 with BOM can resolve host correctly. emoji and with BOM WORK

note: If you use Notepad to check it yourself, be sure to put double quotes in the file name when you save it, or Notepad will be create hosts.txt.

do not forget double-quote when save as hosts with notepad.exe

Appended: (Asked in comment) The hosts file supports inline comments. enter image description here

fliedonion
  • 912
  • 8
  • 13
  • I don't think the hosts file supports inline comments; comments have to be on their own line. So that could be affecting results. – Triynko Mar 11 '22 at 19:34
  • @Triynko Microsoft shows it as an example in the hosts file. (Add that to my answer) – fliedonion Mar 29 '22 at 15:22