11

I think that it's safe to say that C locales are universally recognized as a bad idea.

Writing an application that tries to parse or write text-based machine formats (which happens quite often) with C standard library functions gets near-impossible if you have to account for locale being set to anything different than "C". Since locale is normally per-process (and setlocale is often not thread-safe), if you are writing a library or you have a multithreaded program it's not safe even to do setlocale(LC_ALL, "C") and restore it after doing your stuff.

Now, for these reasons the rule is normally "avoid setlocale, period"; but: we've been bitten several times in the past by the peculiar behavior of QCoreApplication and derived classes; the documentation says:

On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function setlocale(LC_NUMERIC,"C") right after initializing QApplication or QCoreApplication to reset the locale that is used for number formatting to "C"-locale.

This behavior has been described in another question; my question is: what could be the rationale of this apparently foolish behavior? In particular, what's so peculiar about Unix and Linux that prompted such decision only on these platforms?

(Incidentally, will everything break if I just do setlocale(LC_ALL, "C"); after creating the QApplication? If it's fine, why don't they just remove their setlocale(LC_ALL, "");?)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • On linux wide char functions (ex:wcstok) has an extra parameter for making it multithread safe. QT surely use standard libc wide char functions on linux... – j-p Nov 02 '14 at 17:38

3 Answers3

11

From investigations through the Qt source code conducted by @Phil Armstrong and me (see the chat log), it seems that the setlocale call is there since version 1 for several reasons:

  • XIM, at least in ancient times, didn't correctly "get" the current locale without such a call.
  • On Solaris, it even crashed with the default C locale.
  • On Unix systems, it's used (among other systems, in a complex game of fallbacks) to "sniff" the "system character set" (whatever that means on Unix), and thus be able to convert between the QString representation and the "local" 8 bit encoding (this is particularly critical for file paths).

It's true that it already checks the LC_* environment variables, as it does with QLocale, but I suppose that it may be useful to have nl_langinfo decode the current LC_CTYPE if the application explicitly changed it (but to see if there is an explicit change, it has to start with system defaults).

It's interesting that they did a setlocale(LC_NUMERIC, "C") immediately after the setlocale(LC_ALL, ""), but this was removed in Qt 4.4. The rationale for this decision seems to lie in the task #132859 of the old Qt bugtracker (which moved between TrollTech, Nokia and QtSoftware.com before vanishing without leaving any track, not even in the Wayback Machine), and it's referenced in two bugs regarding this topic. I think that an authoritative answer on the topic was there, but I can't find a way to recover it.

My guess is that it introduced subtle bugs, since the environment seemed pristine, but it was in fact touched by the setlocale call in all but the LC_NUMERIC category (which is the most evident); probably they removed the call to make the locale setting more evident and have application developers act accordingly.

Community
  • 1
  • 1
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 1
    Good summary Matteo. My personal belief is that a well-behaved unix application should call `setlocale(LC_ALL, "")` in the application initialisation phase (probably near the start of `main()`). Inside a dynamically loaded library like Qt is however not a good place, for reasons of programmer surprise if nothing else. The history we uncovered suggests that the Qt devs had good reasons to include it originally, and the effects of removing the code may make the Qt devs reluctant to remove it. – Phil Armstrong Nov 16 '14 at 21:04
  • "C" locale being default on Apple (as result of setlocale with empty string) causes applications to crash with invalid character string errors. It also ill-advised to try and use POSIX wide char functions in Qt program, while framework offers portable interface for same functionality – Swift - Friday Pie Feb 14 '17 at 08:15
3

What is so peculiar about POSIX systems (which includes the Unix/Linux systems you mention) is that the OS interface and the C interface are mixed up. The C setlocale call in particular interferes with the OS.

On Windows, in comparison, the locale is explicitly a per-thread property (SetThreadLocale), but more importantly, functions such as GetNumberFormat accept a locale parameter.

Note that your problem is fairly easily solved: When using Qt, use Qt. So that means reading your text input into a QString, processing it, and then writing it back.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
MSalters
  • 173,980
  • 10
  • 155
  • 350
  • How does `setlocale` change anything more than it does on Windows? They both affect just the C standard library functions (the kernel knows nothing of locale), which AFAICT seem to be bypassed by Qt anyway (for localization purposes it seems to have its own QLocale, unrelated to the broken C/C++ facilities). Also, unfortunately the problem is not so easy to solve - we have several libraries that have to be used from "regular" C++, Qt-C++, "regular" Python (through SIP) and Python+PyQt, using always Qt internally is neither an option, nor is actually necessary. – Matteo Italia Sep 04 '14 at 12:23
  • POSIX has the C standard library locale and builds on that. Other than that, the Linux kernel doesn't have a locale. Windows on the other hand has native locale support, even for non-C languages. So it's not that `setlocale` changes more on Linux, it's that there's things it can't change on Windows. – MSalters Sep 04 '14 at 17:06
  • But those things don't seem of Qt interest, in the whole Qt source tree there's only two calls to `SetThreadLocale` and one to `SetLocaleInfo`, and they are all in unit tests. Also, if Qt needed a "general locale setup" at the creation of the `QApplication` it would be reasonable to find it all in the same place, but it only happens for Unix-based OSes. That's why I'm perplexed. – Matteo Italia Sep 04 '14 at 20:22
3

Qt calls setlocale(LC_ALL, ""), because it's the right thing to do: Every standard Unix program from cat on up calls setlocale(LC_ALL, ""). The consequence of that call is that the program locale is set to that specified by the user. See the setlocale() manpage:

On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:

setlocale(LC_ALL, "");

after program initialization...

Given that Qt both generates text to be read by the user and parses input generated by the user, it would be very unfriendly to refuse to let the user communicate with the user in their own locale-specific ways. Hence the call to setlocale().

I would hope that being user friendly would be uncontroversial! The problem of course comes when you try to parse data files that were created by your program running under a different locale. Clearly, if you're using an ad-hoc text-based format with a parser based on sscanf and friends, rather than a specified data format with a "real" parser then this is a recipe for data corruption if done without consideration of the locale settings. The solution is to either a) use a real serialisation library that handles this stuff for you or b) set the locale to something specific ("C" perhaps) when writing and reading data.

If thread safety is an issue then on modern POSIX implementations (or any Linux system with GNU libc version >= 2.3, which is pretty much "all of them" at this point in time) you can call uselocale() to set a thread-local locale for all I/O. Alternately you can call _l versions of the usual functions that take a locale object as a supplementary argument.

Will everything break if you call setlocale(LC_ALL, "C");? No, but the right thing is to let the user set the locale they prefer and either save your data in a well specified format or specify the locale in which your data is to be read and written at runtime.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Phil Armstrong
  • 1,796
  • 1
  • 10
  • 17
  • 1
    Anyhow, the merits of C locales are mostly irrelevant; Qt has its own (way better) facilities to deal with localization (see `QLocale` and the translations framework), which don't seem to use C locales in any way. Also, the argument about imposing "the right thing" to C functions doesn't hold water, since on Windows the `setlocale` call is avoided altogether. Qt probably needs to do that call for some strange side-effect that is needed only on POSIX, but I can't exactly pinpoint what it is. – Matteo Italia Oct 28 '14 at 21:15
  • That would be `strtod_l()`. Or just call `uselocale()`. – Phil Armstrong Oct 28 '14 at 21:16
  • 1
    None of them is available on the Linux machine I'm writing from, none of them is portable C; the Ruby interpreter even went on to bundle his own - slightly broken - version of `strtod` because there's no portable safe alternative. Even if I were to use a nonstandard function in *my* code, I certainly cannot go fixing any third party library which may use a `strtod`. Seriously, the only safe way to go in C is to stick to the C locale. But again, we are digressing, the point is "why does Qt does this call which potentially can break lot of stuff, and why only on POSIX"? – Matteo Italia Oct 28 '14 at 21:22
  • `strtod_l()` and `uselocale()` have been in every Linux distribution since about 2002. There's no man-page for `strtod_l` for some reason, but `uselocale()` and friends have perfectly good manpages. They were also standardised in POSIX.1-2008: http://pubs.opengroup.org/onlinepubs/9699919799/functions/uselocale.html & have nice feature test macros you can call. – Phil Armstrong Oct 28 '14 at 21:32
  • My bad then, I was deceived by seeing no manpages; anyhow, the other objections stand still (and, again, the ugliness of the C locales is not the point of my question). – Matteo Italia Oct 28 '14 at 21:34
  • The short answer is probably "because IEEE Std 1003.1, 2004 Edition says so": http://pubs.opengroup.org/onlinepubs/009695399/functions/setlocale.html – Phil Armstrong Oct 28 '14 at 21:38
  • I'm quite sure that neither POSIX nor C99 talk about what Qt has to do with `setlocale`. – Matteo Italia Oct 28 '14 at 21:42
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/63812/discussion-between-phil-armstrong-and-matteo-italia). – Phil Armstrong Oct 28 '14 at 22:01