performance overhead of the gettext internationalization system in C/C++

Question

I just worked through the documentation of http://www.gnu.org/software/gettext/manual/gettext.html and there is no discussion at all about the performance overhead. On the internet, I found only performance discussions for other languages (PHP and Java) but nothing for C/C++.

Therefore my questions:

What is the performance overhead during startup of a program that uses gettext (load shared library? How are the translations loaded into memory? Are all translations loaded on startup or on-demand?)
What is the performance penalty during normal operation of the program? (i.e. when a translation is needed) How large is the increased memory footprint of the program and how is the memory organized? Is there a higher danger/possibility that parts of the program are swapped to disk when the program is idle? (If the translations are stored in a very different part of the memory than the rest of the program, then in my understanding the chance of a page fault is higher than compared to an un-internationalized version of the program)
Does a program that runs under the "C"-locale also suffer these performance penalties?

Thanks a lot.

A program which uses `gettext` is generating human readable output. The overhead is less that it will take the human to read the text, so can be considered negligible. (That's not strictly true, but realistically, the overhead is _not_ an issue.) — James Kanze, Aug 16 '13 at 08:27
@JamesKanze the program might also generate longish reports, or it might send personalized mass-emails, or ... That the output is human-readable does not imply that there's a human around, and that the program can stop until he/she's finished reading the output. — Chris, Aug 16 '13 at 08:42
@James: Even one single missing byte that causes a page fault will cause a delay that is noticeable even by a human. In extreme cases it may cause a delay of several seconds if the hard-disk has to be spun up. Also quite a lot of command-line programs are started from scripts which may mean that a program is started/stopped thousands of times. But the real point is that "it's negleglible" is not an answer to the questions as it's only negleglible in some cases. — Robby75, Aug 16 '13 at 08:55
@James2: For example if I start a memory-intensive task (for example git-operations eat all memory if you let it) and leave the workstation, all idle programs will be swapped to disk. If the translations are stored far away from the main program, they will also be swapped to disk when no translations are needed at the moment EVEN IF the main program is not idle and in main memory. Then, when I start using the workstation, dozens or even hundreds of processes get swapped back into main memory - in that case it may take several seconds until the translations are available in main memory. — Robby75, Aug 16 '13 at 09:02

Mats Petersson · Answer 1 · 2013-08-16T11:54:20.120

Given that the alternative to this approach is to have a large number of builds, each with something like this in it:

int main()
{
    printf(
#ifdef SWEDISH
           "Hej världen\n"
#elsif ENGLISH
           "Hello, World\n"
#elsif PORTUGUESE
           "Olá, Mundo\n"
#else  
   #error Language not specified. 
#endif
    );
    return 0l;
}

instead we get:

int main()
{
   printf(gettext("Hello, World\n")); 
}

which is easy to read and understand.

I don't know the exact structure of the gettext implementation, but I would expect that it is a hash-table once it's loaded. Possibly a binary tree, but hash-table seems more sensible.

As to the exact overheads, it's very hard to put a number on it - especially, as you say, if something is swapped to disk, and the disk has stopped, it takes 3-4 seconds to get the disk up to speed. So how do you quantify that? Yes, it's possible that the page needed for gettext is swapped out if the system has been busy doing something memory intensive.

Loading the message file should only be a large overhead if the file is very large, but again, if the disk is not spinning, and the file is not cached, then there will be an overhead of several seconds. Again, how to quantify that. The size of the file is clearly directly proportional to the actual size of the translated (or native language) messages.

Regarding point 2:

As far as I know, in both Linux and Windows, pages are swapped out on a "least recently used" (or some other usage statistical) basis, which has nothing to do with where they are located. Clearly the translated messages are in a different place than the actual code - there isn't a list of 15 different translations in the source file, so the translations are loaded at runtime, and will be located in a different place than the code itself. However, the overhead of this is similar to the overhead difference between:

static const char *msg = "Hello, World\n";

and

static const char *msg = strdup("Hello, World\n");

Given that text-strings are generally kept together in the binary of a program anyway, I don't think their "nearness" to the executing code is significantly different from a dynamically allocated piece of memory somewhere in the heap. If you call the gettext function often enough, that memory will be kept "current" and not swapped out. If you don't call gettext for some time, it may get swapped out. But that applies to "none of the strings stored in the executable have been used recently, so they got swapped out".

3) I think English (or "no language selected") is treated exactly identical to any other language variant.

I will have a little further dig in a bit, need breakfast first...

Very unscientific:

#include <libintl.h>
#include <cstdio>
#include <cstring>

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}


int main()
{
    char str[10000] = {};
    char *s = str;
    unsigned long long time;

    for(int i = 0; i < 10; i++)
    {
    time = rdtsc();
    s += sprintf(s, "Hello, World %d", i);
    time = rdtsc() - time;
    printf("Time =%lld\n", time);
    }
    printf("s = %s\n", str);
    s = str;

    strcpy(s, "");
    for(int i = 0; i < 10; i++)
    {
    time = rdtsc();
    s += sprintf(s, gettext("Hello, World %d"), i);
    time = rdtsc() - time;
    printf("Time =%lld\n", time);
    }
    printf("s = %s\n", str);
}

Gives the following results:

$ g++ -Wall -O2 intl.cpp
$ ./a.out
Time =138647
Time =9528
Time =6710
Time =5537
Time =5785
Time =5427
Time =5406
Time =5453
Time =5644
Time =5431
s = Hello, World 0Hello, World 1Hello, World 2Hello, World 3Hello, World 4Hello, World 5Hello, World 6Hello, World 7Hello, World 8Hello, World 9
Time =85965
Time =11929
Time =10123
Time =10226
Time =10628
Time =9613
Time =9515
Time =9336
Time =9440
Time =9095
s = Hello, World 0Hello, World 1Hello, World 2Hello, World 3Hello, World 4Hello, World 5Hello, World 6Hello, World 7Hello, World 8Hello, World 9

The code in dcigettext.c uses a mixture of binary search in a flat array of strings, and a hash function that hashes the string to a PJW hash (see : http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html ).

So, the overhead, once the application has started, appears to be around "just noticeable" (when counting clockcycles), but not enormous.

The exact time it takes to run the first sprintf is somewhat varying in both cases, so I wouldn't say that "using gettext" makes sprintf faster on the first call - just "bad luck" on this run (I had a few other variants of the code, and they all vary greatly on the first call to sprintf, and less for later calls). Probably some setup (possibly caches [printf causing caches to be overwritten with other garbage is quite likely], branch prediction, etc) somewhere that takes extra time...

Now, this clearly doesn't answer your questions on paging out, etc. And I didn't try to make a Swedish, Portuguese or German translation of my "Hello, World" message. I still believe that it's not huge, unless you are indeed running 100s of instantiations of an application per second, and that application doesn't do much other than print a message to the screen after doing some simple calculations, sure, it could be important.

The only REAL way to find out how much difference it makes is to compile the same applicaion with #define _(x) x instead of #define _(x) gettext(x), and see if you notice any difference.

I still think the "paged out" is a red herring. If the machine is under HIGH memory pressure, then it will be running slow no matter what (If I write a piece of code that allocates 16GB [I have 16GB RAM in the machine] on my machine, just about everything except the keyboard itself (can blink the num-lock LED) and the mouse pointer itself (can move mouse pointer around on screen) goes unresponsive).

Thanks for the reply; A hash-table would mean AFAIK that for each translation, a hash would have to created at runtime and the hash has to be found in the table. In that case, your "Hello World" application would probably run several times slower than without gettext. — Robby75, Aug 16 '13 at 09:35
regarding point 2: It is very important where they are located. When they are located near the main program, the chance is very high that both are on the same page! A program that is spread out over many pages will cause many more page misses than a program that only uses one page. — Robby75, Aug 16 '13 at 09:38
Yes, but for the text to be on the same page as your code requires that the code is very small (less than 4KB). If we are talking of some code that actually does something meaningful and useful, beyond printing "Hello, World\n", then it's likely that the code and text of it covers more than a single page at the very least. Calculating a hash for a string is not entirely trivial, but simpler than the processing that printf does to the format string, so I don't believe you are right there. But I've had breakfast, now looking at what libintl actually does. — Mats Petersson, Aug 16 '13 at 09:59
@Robby75: I have updated my answer with a very simple test-case for "how much overhead does a call `gettext` actually provide". The answer is "can't ignore it, but if your application is doing something more than just print a few messages, probably not the biggest culprit". — Mats Petersson, Aug 16 '13 at 11:56

Chris · Answer 2 · 2013-08-16T10:19:52.540

1

Some measurements:

    for ( ; n > 0; n--) {
#ifdef I18N
            fputs(gettext("Greetings!"), stdout);
#else
            fputs("Greetings!", stdout);
#endif
            putc('\n', stdout);
    }

With n = 10000000 (10 Million), and redirecting output to a file. No po file for the locale, so the original string is printed (identical output file). User time in seconds:

0.23 with I18N undefined
4.43 with I18N
2.33 with I18N and LC_ALL=C

Overhead of 0.4 microseconds per call. (On a Phenom X6 @3.6GHz, Fedora 19). With LC_ALL=C the overhead is only 0.2 µs. Note that this is probably the worst case - usually you'll do something more in you program. Still, it's a factor of 20, and that includes IO. gettext() is rather slower than I would have expected.

Memory use I have not measured, as it probably depends on the size of the po file. Startup time I have no idea how to measure.

edited Aug 16 '13 at 10:19

answered Aug 16 '13 at 10:08

Chris

4,133
30
38

Maybe set n to 1 and start it in a script: `for i in \`seq 10000000\` ; do run_test_program; done` to measure startup time. In fact this scenario is exactly what is done in the real world very often with command-line tools. – Robby75 Aug 16 '13 at 20:38
Another thing: You say that you have no po-file, that would mean that with a po-file the overhead would still be much greater because it would have to parsed/searched/etc - so a stil worse worst-case scenario ;-) – Robby75 Aug 16 '13 at 20:42

performance overhead of the gettext internationalization system in C/C++

2 Answers2