3

With the default "C" locale only a-z get transformed by std::toupper() as is documented for example here. Which characters exactly get transformed by std::ctype<CharT>::toupper() with the default C++ locale?

I'm asking because std::toupper(L'ω', std::locale::classic()) returns L'Ω' on Windows and I'm wondering for which other characters the C++ locale also returns an upper case form. In the "C" locale the same character is not transformed: static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))) returns L'ω' as expected.

I used the following program to verify this:

#include <cwctype>
#include <fstream>
#include <locale>

int main()
{
  std::wofstream fs("out.txt");
  fs.imbue(std::locale("en_US.UTF8"));
  fs << L"std::toupper(L'ω', std::locale::classic()): " << std::toupper(L'ω', std::locale::classic()) << std::endl;
  fs << L"static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))): "
     << static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))) << std::endl;

  return 0;
}

Content of out.txt when compiled with Visual Studio 2019 (save source file with UTF-8 encoding and add compiler switch /utf-8) and executed on Windows 10:

std::toupper(L'ω', std::locale::classic()): Ω
static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))): ω

Output with gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04):

std::toupper(L'ω', std::locale::classic()): ω
static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))): ω
Daniel Eiband
  • 73
  • 1
  • 5
  • 1
    Well, it depends on what the *default* locale is... and it depends on the local configuration, which you failed to fully describe (what is the LOCALE environment variable on you Ubuntu system, and how is your Windows system localized?). The `C` locale is specified by the standard, but there is no `C++` one: it just means *use the current system locale*. – Serge Ballesta Feb 23 '21 at 11:04
  • 1
    I'm using `std::locale::classic()` in my examples. Shouldn't this be invariant to the environment? The Ubuntu is running in WSL and the `LANG` environment variable is set to `C.UTF-8`. The Windows system is set to "English (United States)". – Daniel Eiband Feb 23 '21 at 12:12
  • This is my understanding of the default locales: By default (at program startup) the C++ locale is `std::locale::classic()` which means use the C locale. The default C locale is equivalent to `std::setlocale(LC_ALL, "C");` which is a minimal locale and [not the user-preferred locale](https://en.cppreference.com/w/cpp/locale/setlocale). I assume "minimal" implies environment independent. Unless `std::setlocale()` or std::locale::global()` is called in the program, I would expect that no environment settings have any influence on the outcome of any `toupper()` function (neither C nor C++). – Daniel Eiband Feb 23 '21 at 12:53

0 Answers0