Extract single translation from gettext .po file from shell

Question

As a build step of my C++ application (using CMake as a build system) I need to create some template files that should include localized strings.

The strings are available from translators already in the form of gettext .po files (the same that will be used for translation of the application itself).

I therefore need a way to extract the translation to a given English source string from a .po file (either via Bash/shell or via CMake)

What I came up with so far is the following

translated_string=$(
    msggrep --msgid -e "^${untranslated_string}$" ${po_file} \
        | msgattrib --no-fuzzy \
        | grep -A 1 'msgid "${untranslated_string}$"' \
        | sed -n 's/msgstr "\(.*\)"/\1/p'
)

Obviously those are a lot of calls for a "simple" function:

msggrep outputs a .po file that only has the string I want
msgattrib makes sure the translation is not "fuzzy" (i.e. needs updating) as I can't use those
Then I manually extract the translation using grep and sed

I imagine there has to be a better approach to this? After all gettext does make it easy to translate my application at runtime, but it seems somewhat unflexible at build time...

Guido Flohr · Answer 1 · 2019-02-16T07:45:27.743

1

.po files are meant for translators. Software should use compiled .mo files for retrieval of translations.

GNU gettext ships with a binary gettext which is meant to internationalize shell scripts. You can use that for your purposes as follows:

$ mkdir -p de/LC_MESSAGES 2>/dev/null
$ msgfmt --verbose --statistics de.po -o de/LC_MESSAGES/package.mo
$ TEXTDOMAINDIR=. LANGUAGE=de LANG=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 gettext package 'Hello, world!'

Replace "package" with your textdomain, "Hello, world!" with your message id and de with the language of your choice. Note that this requires that the selected locale - in this case a de locale - is installed on the build system.

See gettext(1) and msgfmt(1) for more information.

edited Feb 16 '19 at 07:45

answered Feb 15 '19 at 10:26

Guido Flohr

1,871
15
28

This is a nice solution and I could make it work in principle. However it does not provide translations for many locales (i.e. just prints the English string). It seems not to be working for non-latin locales unfortunately, which is a pity given that it works well otherwise. Can you elaborate on your note? Could this be related? What do you mean by "installed locale" (i.e. what decides whether a locale is installed or not?). – Patrick Storz Feb 15 '19 at 18:51
The command `locale -a` lists your installed locales. Are the "non-latin" locales installed on your system? Try `strace` or the equivalent for your platform to see which paths are tried for `$PATH/gettext`. – Guido Flohr Feb 15 '19 at 21:09
Yes, in principle locales sem to be installed, but this is quickly getting more complicated and fragile than what I had initially... Even if I ignore the fact that gettext does not properly print all translations, I'll hit just another issue: The encoding of the printed strings is completely messed up and I don't seem to be able to make it print as UTF-8, therefore even if I get all translations, I'll still not be able to properly use them. I'll do one more trial with Python's gettext implementation, otherwise I'll just stick with my shell script. – Patrick Storz Feb 15 '19 at 21:53
Please read the user docs for example at http://www.sensi.org/~alec/locale/other/about-nls.html. You can maybe fix your problem by not only setting `LANGUAGE` but also `LANG` and include the charset in the latter. I will update my answer accordingly. – Guido Flohr Feb 16 '19 at 07:44
I had already tried mutltiple combinations of `LANGUAGE` and `LANG`, including and exlcuding region, as well as appending encoding, all to no avail. Either my version of gettext is buggy (I'm using a native mingw-w64 version compiled by MSYS2 project) or the program is just "too clever" to simply output a string from one file in the locale I want. I've solved it in python now (I'll add an answer), which was pretty straightforward, works just as well as my shell script version but is significantly more performant. (it could've saved me a lot of.time if I had started with that...) – Patrick Storz Feb 17 '19 at 00:52

score -1 · Answer 2 · answered Feb 17 '19 at 01:26

Using binary .mo files and extracting the string from there as suggested by Guido Flohr in https://stackoverflow.com/a/54707280/2514664 turned out to be a workable solution.

However using the native gettext executable to eventually extract the string as was proposed in the original answer turned out to have just too many implications (after all it's designed to extract a user-visible string in a locale most suitable for the user from an existing message catalog, and not as a build tool which "just" has to extract a single string in a specific language from a specific file). It's ultimately too fragile for usage in a build system that is supposed to work on multiple platforms.

I turns out using Python and it's custom gettext module works a lot better and can save a lot of effort (especially if you're comfortable with Python).

Generating the .mo files from the .po files can work as suggested in the linked answer:

mkdir -p locale/de/LC_MESSAGES
msgfmt de.po -o locale/de/LC_MESSAGES/package.mo

Getting the translation in Python is than as easy as

import gettext

translation = gettext.translation('package', localedir='locale', languages=['de'])
translated_string = translation.gettext('unstranslated_string')

Btw. after solving my own problem I found that there are also a msgfmt.py and pygettext.py in my Python distribution's "/Tools/i18n" folder, which are supposed to offer the same functionality as msgfmt and gettext themselves. Both of them could be interesting to address similar issues, either to use directly or to look at their implementation to create something new (msgfmt.py includes a simple .po file parser for example).

Extract single translation from gettext .po file from shell

2 Answers2