0

The GNU gettext manual describes the dcgettext function as follows:

Both take an additional argument at the first place, which corresponds to the argument of textdomain. The third argument of dcgettext allows to use another locale category but LC_MESSAGES. But I really don’t know where this can be useful. If the domain_name is NULL or category has an value beside the known ones, the result is undefined. It should also be noted that this function is not part of the second known implementation of this function family, the one found in Solaris.

Source: https://www.gnu.org/software/gettext/manual/html_node/Ambiguities.html

Is there any use for providing a different category than the default LC_MESSAGES for message translation? What would it even do? (Does it use the locale setting for that different category rather than the locale setting for LC_MESSAGES? What happens if LANGUAGE is set - wouldn't it override that category anyway, or does it only override LC_MESSAGES?) Since even the documentation writers are struggling to find a purpose for this feature, I really question whether it has any purpose at all. Trying

ls /usr/share/locale/*/LC_[^M]*

turned up no files, so it appears nobody is using this. But can anyone provide insight on what this feature was/is for and whether it's useful?

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711

2 Answers2

3

Apparently dcgettext was provided "for compatibility."

Quoting the GNU C Library's Translation with gettext section, third paragraph from the bottom:

The dcgettext function is only implemented for compatibility with other systems which have gettext functions. There is not really any situation where it is necessary (or useful) to use a different value but LC_MESSAGES in for the category parameter. We are dealing with messages here and any other choice can only be irritating.

(Personally I don't find this a particularly satisfying answer, as it gives no hint regarding which "other systems" the they were seeking compatibility with -- but it's the only authoritative explanation I've found so far.)

Hephaestus
  • 1,982
  • 3
  • 27
  • 35
2

Edited for better examples:

  • A friends' birthday reminder widget

    This requires dcgettext() or dcngettext(), because it should use the LC_TIME category rather than LC_MESSAGES category (for the one localized string, "It's %s's birthday today!"), because users would expect the LC_TIME environment variable to control the language of the widget, the same way it does for e.g. the date command.

  • A restaurant bill splitter widget

    To make it easier to understand and split bills in other countries (especially countries where you can barely understand the bill), this would use LC_MONETARY category for the bill fields, so that the users can select the currency by changing the LC_MONETARY environment variable.

    Let's assume the widget is intended for traveling users, or is perhaps supported by a simple server backend, which stores descriptions and numeric amounts, but no monetary units. Each bill is a simple dataset, containing locale, total amount, description string, and a list of participants, each participant specified by a string and a number. Sum of the numbers should always be at least the total amount, the extra be the tip.

    The user interface (menus, options etc.) are localized as normal using the LC_MESSAGES category, but each bills locale overriding the LC_NUMERIC and LC_MONETARY locale categories, and the application-specific strings in the widget -- "total", "tip" and so on -- using the LC_MONETARY category in the localization file. (Therefore the code would have dcgettext(NULL,"Total",LC_MONETARY), `dcgettext(NULL,"Tip",LC_MONETARY) and so on.)

    When creating a new bill, you can implement the locale selection by simply switching to the desired locale in LC_MONETARY and/or LC_NUMERIC category.

    The reason you would want to do this is simple: you could have an user interface that shows the typical bill according to the local localization (per restaurant locale), while the rest of the user interface, especially tool tips, hints, help et cetera, is still in the main locale/language (as determined by LC_MESSAGES).

    Regardless of whether the widget was a graphical Qt/GTK+ or a command-line one, it could always use the normal environment variables to define its initial locale (LC_MESSAGES for user interface, LC_MONETARY and LC_NUMERIC for the new bill).

    Most programmers would likely use a configuration file or manager or registry key to store the locale, but since it is trivially available, well standardized, why duplicate the functionality? Moreover, a user could create aliases or shortcuts that simply set a different initial locale (for the two categories), and could have multiple instances of the widget open, using different billing locales, for example for comparison or understanding the bill.


gettext(msgid) is equivalent to dgettext(NULL,msgid) is equivalent to dcgettext(NULL,msgid,LC_MESSAGES).

In fact, in current GNU gettext, gettext(msgid) is a wrapper around dcgettext(NULL,msgid,LC_MESSAGES), and dgettext(domain,msgid) is a wrapper around dcgettext(domain,msgid,LC_MESSAGES).

The category parameter to dcgettext() allows you to select which category is used to determine the locale. For example, if you used dcgettext(NULL, "FOO", LC_MONETARY), then the LC_MONETARY category would be used to determine the actual locale to use. Because the C library provides the category-specific functions like strftime() (uses the LC_TIME category) and strcoll() (uses the LC_COLLATE category), most applications only explicitly use the LC_MESSAGES category. (They do, however, use the other categories via the C library functions.)

The user can control the locale for each category via environment variables.

For the GNU C library, the environment variables are interpreted as follows:

  1. If LC_ALL is not empty, it defines the locale for all categories.

    Otherwise:

  2. If LC_CATEGORY is not empty, it defines the locale for category CATEGORY.

    Otherwise:

  3. If LANG is not empty, it defines the locale.

    Otherwise:

  4. The locale is C/POSIX.

In other words, LANGUAGE is ignored, and LANG is only used if both LC_ALL and the relevant LC_category environment variables are empty or undefined.

In my experience, other OSes with gettext or similar localization support, have the same environment variable support pattern -- LC_ALL being the override, LC_category being the specific setting, with LANG (and possibly LANGUAGE) as defaults if nothing else is set.


It is very useful to use a mixed-locale environment, where LC_ALL is undefined or empty, some of the LC_ environment variables are defined to a specific locale, with others undefined or empty or C, possibly with a default LANG defined just to be sure.

I personally sometimes use

LC_ALL= \
LC_TIME=C \
LC_NUMERIC=C \
LC_CTYPE=C \
LC_MESSAGES=C \
LC_COLLATE=fi_FI.utf8 ls -laF --color=auto

as an alias for ll. It lists the files and directories in the specified directory, using the C/POSIX locale for everything except string collation (string sort order), which uses Finnish rules. That gives me the output sorted according to typical Finnish rules, but everything is in C/POSIX locale.

I might switch to a LC_TIME locale that used ISO 8601 dates, or perhaps a human-friendly version of ISO 8601 (YYYY-MM-DD HH:MM:SS.ttt TZ). Just haven't yet cared enough to look for one or write one myself.

Questions?

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • I don't understand your examples. `LC_MESSAGES` and/or `LANGUAGE` is used to request the *language of messages* the user wants. For the first example where `LC_TIME` is requesting day/month names in a particular language, I can partly see how you would want the associated message strings to match the language of the day/month names... – R.. GitHub STOP HELPING ICE Jul 28 '14 at 05:32
  • ... but the second example makes no sense at all to me. `LC_MONETARY` is not affecting the language of anything in the normal locale system usage; it's just affecting the currency name/sign and formatting of monetary numbers. So I don't get the motivation for why a program would want to show the user messages in a language based on the currency in use rather than based on the user's selected messages language. – R.. GitHub STOP HELPING ICE Jul 28 '14 at 05:34
  • @R..: Edited in hopes of describing the intent better. I removed the third example, and expanded on the second example. The idea is that one might want to use one locale for the user interface (selected via the `LC_MESSAGES` category), and another locale for bill contents (via `LC_MONETARY`). Assume you're abroad: you'd set `LC_MONETARY` to the locale, but keep UI in English, and you'd see what kind of bill you can expect there (including strings for "Total" and "Tip"). If graphical, tooltips and menu would still be in `LC_MESSAGES` category locale. Would work much like `date` utility. – Nominal Animal Jul 28 '14 at 09:32
  • Unless I've seriously misunderstood, LC_TIME is *supposed* to specify how time should be formatted and it shouldn't change any strings related to context of time. LC_TIME is used to select between `2020-12-31` and `31/12/2020`. Similarly, LC_MONETARY is supposed to select between `-1234 €` vs `($ 1,234)`. Some countries use different markup for negative amounts of money compared to negative numbers and that's the reason why there's both LC_MONETARY and LC_NUMERIC. The language of messages (strings) displayed to the user should follow LANGUAGE, LC_ALL, LC_MESSAGES, LANG (in priority order). – Mikko Rantalainen May 12 '21 at 09:23
  • Note that environment variable LANGUAGE is not supposed to override e.g. LC_TIME but only select the language of messages. The whole point of using LANGUAGE is to allow fallback to alternative languages if the translation is not available using just LC_MESSAGES. Due historical mistake the LANGUAGE overrides LC_MESSAGES even if LC_ALL or LC_MESSAGES points to available translation. If I were to decide, the priority order were LC_ALL, LC_MESSAGES, LANG, LANGUAGE because the whole point of LANGUAGE is to provide better fallback than category `C`. – Mikko Rantalainen May 12 '21 at 09:29