Actually, the very example on the site shows a difference:
#include <iostream>
#include <cctype>
#include <clocale>
int main()
{
unsigned char c = '\xb4'; // the character Ž in ISO-8859-15
// but ´ (acute accent) in ISO-8859-1
std::setlocale(LC_ALL, "en_US.iso88591");
std::cout << std::hex << std::showbase;
std::cout << "in iso8859-1, tolower('0xb4') gives "
<< std::tolower(c) << '\n';
std::setlocale(LC_ALL, "en_US.iso885915");
std::cout << "in iso8859-15, tolower('0xb4') gives "
<< std::tolower(c) << '\n';
}
Output:
in iso8859-1, tolower('0xb4') gives 0xb4
in iso8859-15, tolower('0xb4') gives 0xb8
Because the C language has no notion of encoding, a char
(and thus a char const*
) are just bytes. When switching locale, you switch the interpretation of those bytes, for example here the byte 0xb4
(180) is outside the ASCII range (0-127), and therefore its meaning changes depending on the locale you switch to:
- in ISO-8859-1, it means
´
, and therefore is unchanged when moving from upper to lower
- in ISO-8859-15, it means
Ž
, and therefore changes to ž
(0xb8 in this locale) when moving from upper to lower
You would think that in a post-Unicode world, this would be irrelevant, but many have not yet transitioned to Unicode...