Why does Windows provide different code pages for console and non-console?
Because of backwards compatibility with MS-DOS applications, which can still run on 16 and 32-bit Windows, and lots of whom are also ported to Windows console. Besides, the ability to use Alt code from DOS has deeply ingrained in users that they'll complain if they can't type their favorite special characters anymore, so a DOS code page is a must
DOS originally uses the code page 437 that was built into the EGA and VGA ROM. But later ISO and IEC came together to make new standard code pages, so Microsoft quickly jumped in and used code page 1252 for Windows, which was based on an early draft that later became ISO 8859-1
The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1
Why is the default 8-bit codepage called "ANSI"?
In fact, Microsoft was always an early adopter. For example, it was the first to adopt a Korean standard and was the first to use Unicode, both of which would be regretted later. The former was never used by others, and the latter makes writing portable code difficult. Everyone else came later and used the newer and better UTF-8 instead.
Microsoft takes backwards compatibility very seriously, so while introducing the new Windows code page they can't change console apps' behaviors. Therefore they can only make the change for GUI apps. As a result legacy Windows GUI apps (before the advent of Unicode) will use ANSI code pages and a separate code page will still be maintained for console apps. A different way to enter special characters also needs to be introduced: which is differentiated by the first numpad key after Alt
If it's numpad 1-9 then the DOS code page (A.K.A OEM code page) will be used. Alt+7 will produce code point 7 (U+2022 "•" in CP437)
If it's numpad 0 then the Windows code page (A.K.A ANSI code page) will be used. Alt+0149 will produce code point 149 which is the same U+2022 "•" in CP1252
If it's numpad + then input is hexadecimal UCS2/UTF-16. This is the new behavior for new Windows GUI apps that use Unicode. Typing Alt++2022 gives you the same U+2022 "•" character
Note that this requires the hex numpad to be enabled by setting a REG_SZ
value with name EnableHexNumpad
in the HKCU\Control Panel\Input Method
registry key then reboot
See also Which character encoding is used for ALT-codes?
How are these code pages determined per machine?
Each locale has 4 different default associated code pages: OEM (DOS), ANSI (Windows), EBCDIC and Mac (classic) code pages, of which only the first 2 are actually important nowadays. So on the default US locale after installing Windows you'll have CP437 and CP1252 for DOS and Windows code pages respectively. But these can easily be changed, for example by chcp
, by an API call or by editing the registry.
What is the relation between code pages on the same machine? Is there a correlation between the console and non console code pages?
The only relation they have is the connection with the locale
Will machines with codepage 1252 always have console codepage of 437?
No, because the code pages can be changed by the user as I said. Besides there may be non-US locales that also use CP1252 but use another DOS code page by default