7

Microsoft Windows provides several functions to query the current code-page: GetACP, GetConsoleOutputCP, GetConsoleCP.

They return different values. For example, on my machine, GetACP returns 1252 while GetConsoleOutputCP and GetConsoleCP return 437.

(We can also run chcp on the command line and get 437)

  • Why does Windows provide different code pages for console and non-console?
  • How are these code pages determined per machine?
  • What is the relation between code pages on the same machine? Is there a correlation between the console and non console code pages? Will machines with codepage 1252 always have console codepage of 437?

The background for this question is an error message from Visual Studio C++:

error C2855: command-line option '/source-charset' inconsistent with precompiled header
error C2855: command-line option '/execution-charset' inconsistent with precompiled header

These errors occurred when the precompiled headers file was built with a different default code-page than the CPP file that was using them (for whatever reason).
From the MSDN docs:

If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you specify a character set name or code page by using the /source-charset option.

So I'm trying to figure out which code page they refer to, the one that is returned by GetACP or the others...

phuclv
  • 37,963
  • 15
  • 156
  • 475
Amir Gonnen
  • 3,525
  • 4
  • 32
  • 61
  • 1
    Compatibility, the console sub-system was meant to be helpful to port programs written in the MS-Dos days. Back when Microsoft made a living selling DOS to the many OEMs. Code page 437 and the raster fonts recreate the original IBM-PC character set. 850 is common in continental Europe, with more glyphs to display diacritics. Next opportunity to redesign code pages with more emphasis on rendering text correctly came at Windows. – Hans Passant Mar 08 '23 at 14:30

4 Answers4

15

The ANSI and OEM codepages are determined by the system locale that's loaded when the system boots. They get mapped into every process as the PEB fields AnsiCodePageData and OemCodePageData. The runtime library in ntdll.dll has many functions that work with these string types, e.g.RtlAnsiStringToUnicodeString and RtlOemStringToUnicodeString.

Functions ending with A in the Windows API are ANSI, except file system functions can be switched to OEM via SetFileApisToOEM. The console API defaults to OEM for compatibility with legacy applications, and can be changed to another codepage via SetConsoleCP and SetConsoleOutputCP. chcp.com (or mode.com) calls these functions, but it doesn't allow setting the input buffer and screen buffer to different codepages.

If the ANSI codepage is 1252, the OEM codepage isn't necessarily 437. That's only in the U.S. locale. Most Western locales that use 1252 as the ANSI codepage will use 850 as the OEM codepage.

An application that says it's using the user code page may not be referring to the system ANSI or OEM codepage. Instead it could be calling, e.g., GetLocaleInfoEx to query the LOCALE_NAME_USER_DEFAULT locale for the LOCALE_IDEFAULTANSICODEPAGE or LOCALE_IDEFAULTCODEPAGE.

Eryk Sun
  • 33,190
  • 5
  • 92
  • 111
  • 2
    To the downvoter, if you downvote without an explanation, that's your prerogative. But it's more helpful to at least give a little bit of feedback to let me know what's wrong; if there's a way for me to improve the answer, or if your reasons are significant enough that I should delete this answer. – Eryk Sun Apr 03 '17 at 21:26
  • The downvoter is probably trolling. The question was downvoted as well without an explanation. Now the last thing you mentioned got me confused. Do we have other code pages in addition to ANSI and OEM? According to [this MSDN page](https://msdn.microsoft.com/en-us/library/windows/desktop/dd373761(v=vs.85).aspx) `LOCALE_IDEFAULTANSICODEPAGE` returns the ANSI code page and `LOCALE_IDEFAULTCODEPAGE` returns the OEM code page. In what case would they be different from the codepages returned by `GetACP`, `GetConsoleCP` etc.? – Amir Gonnen Apr 04 '17 at 06:03
  • As I mentioned the system locale is what gets used by the "A" suffixed functions in the Windows ANSI API. It's typically using the system ANSI codepage, but the file system APIs can be switched to the system OEM codepage, and the console defaults to OEM. The ANSI API is deprecated. Programs should be using the "W" suffixed Unicode API, and many new functions like `GetLocaleInfoEx` don't even have ANSI implementations. – Eryk Sun Apr 04 '17 at 06:23
  • Preferably text should be saved as UTF-8 or UTF-16 with a BOM, not using legacy codepages. But we don't live disconnected from the past, and codepages are still relevant in many cases. – Eryk Sun Apr 04 '17 at 06:27
3

The command console uses a different codepage for legacy reasons. The programs running on the console were often written for DOS, and the character set included things like line drawing characters that would be useful in this context. In a graphical environment with native Windows apps it was more important to expand the available characters since the lines would be drawn directly instead of being simulated in fonts.

The default code pages are determined by the language Windows will be using. Different languages require different characters, and a single code page wasn't enough to fit all of the characters used by European languages. You will find code page 1250 used in some Central and Eastern European locations for example.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
2

Why does Windows provide different code pages for console and non-console?

Because of backwards compatibility with MS-DOS applications, which can still run on 16 and 32-bit Windows, and lots of whom are also ported to Windows console. Besides, the ability to use Alt code from DOS has deeply ingrained in users that they'll complain if they can't type their favorite special characters anymore, so a DOS code page is a must

DOS originally uses the code page 437 that was built into the EGA and VGA ROM. But later ISO and IEC came together to make new standard code pages, so Microsoft quickly jumped in and used code page 1252 for Windows, which was based on an early draft that later became ISO 8859-1

The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1

Why is the default 8-bit codepage called "ANSI"?

In fact, Microsoft was always an early adopter. For example, it was the first to adopt a Korean standard and was the first to use Unicode, both of which would be regretted later. The former was never used by others, and the latter makes writing portable code difficult. Everyone else came later and used the newer and better UTF-8 instead.

Microsoft takes backwards compatibility very seriously, so while introducing the new Windows code page they can't change console apps' behaviors. Therefore they can only make the change for GUI apps. As a result legacy Windows GUI apps (before the advent of Unicode) will use ANSI code pages and a separate code page will still be maintained for console apps. A different way to enter special characters also needs to be introduced: which is differentiated by the first numpad key after Alt

  • If it's numpad 1-9 then the DOS code page (A.K.A OEM code page) will be used. Alt+7 will produce code point 7 (U+2022 "•" in CP437)

  • If it's numpad 0 then the Windows code page (A.K.A ANSI code page) will be used. Alt+0149 will produce code point 149 which is the same U+2022 "•" in CP1252

  • If it's numpad + then input is hexadecimal UCS2/UTF-16. This is the new behavior for new Windows GUI apps that use Unicode. Typing Alt++2022 gives you the same U+2022 "•" character

    Note that this requires the hex numpad to be enabled by setting a REG_SZ value with name EnableHexNumpad in the HKCU\Control Panel\Input Method registry key then reboot

See also Which character encoding is used for ALT-codes?


How are these code pages determined per machine?

Each locale has 4 different default associated code pages: OEM (DOS), ANSI (Windows), EBCDIC and Mac (classic) code pages, of which only the first 2 are actually important nowadays. So on the default US locale after installing Windows you'll have CP437 and CP1252 for DOS and Windows code pages respectively. But these can easily be changed, for example by chcp, by an API call or by editing the registry.


What is the relation between code pages on the same machine? Is there a correlation between the console and non console code pages?

The only relation they have is the connection with the locale


Will machines with codepage 1252 always have console codepage of 437?

No, because the code pages can be changed by the user as I said. Besides there may be non-US locales that also use CP1252 but use another DOS code page by default

Daniel LB
  • 25
  • 6
phuclv
  • 37,963
  • 15
  • 156
  • 475
1

How are these code pages determined per machine?

Have a look at this table National Language Support (NLS) API Reference

Or query your Registry:

C:\>reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
    OEMCP    REG_SZ    850


C:\>reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v ACP

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
    ACP    REG_SZ    1252
Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110