0

I wast just reading about std::tolower in CPP-Reference.

Is std::to_lower maybe just a wrapper of a std::use_facet function?

Please see the following example?

#include <iostream>
#include <locale>

int main() {
    char c1{ 'A' }, c2{'B'};

    std::cout << std::use_facet<std::ctype<char>>(std::locale("C")).tolower(c1) << '\n';
    std::cout << (char)std::tolower(c2) << '\n';
}

Yes, std::tolower works with integers, but else, is it calling use_facet or similar?

A M
  • 14,694
  • 5
  • 19
  • 44
  • 5
    The C++ standard does not specify how, exactly, each library function is implemented. The standard specifies the results of each library function. How the library function gets implemented is not specified in the standard. This could be one possible implementation. But there's also the equivalent `tolower()` from the C library, that the C++ function can be a simple alias for, for example. – Sam Varshavchik Dec 24 '22 at 15:25
  • 1
    `std::tolower` is the C function - just another name for `::tolower` defined in `ctype.h`. It is highly unlikely it's implemented using C++ facilities - if anything, chances are the dependency goes the other way round. – Igor Tandetnik Dec 24 '22 at 16:25
  • 1
    A simple table lookup would be the obvious implementation. – john Dec 24 '22 at 16:27

1 Answers1

1

What is under the hood of std::tolower?

Absolutely nothing useful.

Supposedly the library can use a locale to handle language concerns, but as it currently stands in C++ this has been a long, frustrating pipe dream.

What do I do, then?

Use IBM’s International Components for Unicode. It is a mature, stable library that exists on literally every system that it makes sense to program with i18n. It is on Android and iOS phones (and all the knock-offs), it is on Windows, Linux, Unix, OS X, etc.

The tricky part is just interfacing with the installed system ICU. That is different for each system, but not particularly difficult. (It becomes part of the build script, as does every system-dependent build script.)

ICU works with both C and C++ (though the C++ capability is quite a bit lean compared to the C capability).

(You can also use it with Java, and ported interfaces exist for quite a few other languages as well.)

Since you have C++ tagged, I recommend you just use the C capabilities of the library over a std::wstring (Windows, C++17 or earlier) or a std::u16string (Windows C++20+ and everything else).

Boost Libraries

Boost provides a very nice C++ library to do this kind of stuff.

You can configure Boost Locale to use ICU as a backend.

I haven’t messed with it for quite a long time, and configuring the compile (Boost Locale is one of the Boost Libraries that needs to be compiled) is tricky. Make your way through that and you are golden, though.

Caveats

Managing your locale becomes important. Your program should default to using the user’s system-indicated locale. ICU makes this easy to access and use.

Letter casing is not a universal capability in all languages. Case-conversion and case-folding functions understand this, and behave correctly for those languages.

One particular point is that Turkish has a corner case you should be aware of: the letter I. Any reading you do on letter casing should mention this.

Remember also, that locale is context sensitive. For example, you will likely wish to use a different locale for program code vs strings displayed to the user.

Dúthomhas
  • 8,200
  • 2
  • 17
  • 39
  • This is very good advice — *if you do any non-ASCII text processing at all*. Few people actually do that. – n. m. could be an AI Dec 24 '22 at 17:27
  • @n.m.: "*Few people actually do that.*" And a lot of them actually do that; they just don't realize they are/have to. Which makes the eventual work of supporting non-ASCII text much harder for everyone. – Nicol Bolas Dec 24 '22 at 17:49
  • @NicolBolas I always assume the best possible interpretation of another’s words. I think **n.m.** was saying that _few people bother to write their code with any non-ASCII text processing considerations_. In other words, I think we are all on the same page here... – Dúthomhas Dec 24 '22 at 18:26
  • @NicolBolas Most people copy strings around and concatenate them. No searching, no cutting, no transformation. No ICU is needed for this. – n. m. could be an AI Dec 24 '22 at 19:06
  • @n.m.: And then, when you suddenly have a need to do those things, you lack the tools for it. And probably think that you can just use C functions and it'll be OK, because everybody more or less uses ASCII, right? Or you don't even consider non-ASCII as a possibility, so you never ask the question. – Nicol Bolas Dec 24 '22 at 19:22
  • @NicolBolas When I suddenly need a cryptography library or a linear algebra library or an image processing library, I go and use them. Until then, I don't. Does a Unicode library have a special sacred status? – n. m. could be an AI Dec 24 '22 at 19:43
  • I really think we’re arguing the same side of the same coin here. Can’t we all just play nice? – Dúthomhas Dec 24 '22 at 19:55