Efficiency of gettext : in-memory translation

Question

I have an embedded system with Flash and a very low end CPU and less RAM. I wanted to know how efficient is the gettext language translation using .MO file.

For doing the locale language string fetch, do every time gettext read MO file from flash OR, the complete MO binary file is first loaded into RAM, and do the locale string fetch from there ?

If the MO file (It will be large ~1Mb since there are a lot of strings) is always loaded into RAM, it will eatup my RAM.

Has the system you describe got an operating system? Has it go a file system? Has it got external Flash. or a memory card with a file system? — gbulmer, Mar 22 '12 at 11:32
@Lunar Mushrooms - sorry I didn't get your comment before I posted. I don't know why, it is normally really good at updating. — gbulmer, Mar 22 '12 at 13:28

gbulmer · Accepted Answer · 2012-03-23T12:54:36.820

As MSalters said it is open source, so you could tweak it.

If you give a fuller definition of the system (as per my comment) we might be able to help more.

If this is a deeply embedded system (the sort of stuff I do), with no OS, and no external file system of any type, the strings must all be in memory. There will very likely be a mechanism to store those strings in flash, so that they consume no RAM.

For example, on an ARM, data structures can easily be stored in flash. To do that, you need to tell the compiler which segment of the program to store them, using something like:

const char mesg1[] __attribute__((section (".USER_FLASH"))) 
             = "Ciao a tutti";
const char mesg2[] __attribute__((section (".USER_FLASH"))) 
             = "Riesco a sentire la mia mente va Dave";

When the program is linked, the linker script needs to be written to place the strings into Flash, and they will not be copied to RAM.

Approximately how much space can you dedicate to messages? How much space do they take?

You may be fighting a well researched problem; the amount of programming effort increases exponentially as resource limits are approached. It may take tremendous effort to fit stuff into the final few % of memory.

Once 'obvious' tweak is to try a few simple compression techniques. One might get applied on the raw messages, and uncompressed as the messages are printed.

Edit: I thought your question seemed so straightforward and natural, that I had assumed the answer would be straightforward to find.

I had a look at the gettext documentation, but failed to find it there. I downloaded the source. After 10 minutes, I honestly could not tell you how it worked. I can tell you it is much more complicated than I'd expected. I looked at the extensive documentation. Lots of documentation on how to best organise to do translation, on how to prepare the program, on things that can cause problems. Very helpful insights. Yet I could not find any documentation explaining its overall run-time architecture. None. Nothing.

My best advice is to go to the GNU gettext mailing lists, search/look and if necessary ask. The mailing list archives can be found at http://savannah.gnu.org/projects/gettext/ I apologise that I couldn't be more helpful.

Thank you, very much , I got a clue of how to handle manually, from your answer. Also I will look into the source code of gettext to know how the MO files are handled (I guess they are loaded into RAM - let me see). — Lunar Mushrooms, Mar 23 '12 at 04:07
As long as you are working on it, and making some progress, people will likely try to help. Good luck. — gbulmer, Mar 23 '12 at 04:27
Aha, I got a link here http://stackoverflow.com/questions/3437105/why-doesnt-gettext-have-a-db-storage-option that tells : "Also, the compiled gettext files (.mo) are optimized for loading in memory and for this reason they are more appropriate than plain text files (like not-compiled .po files)." — Lunar Mushrooms, Mar 27 '12 at 14:20

score 0 · Answer 2 · answered Jan 30 '22 at 13:25

gettext is typically used with a hash table:

when the user selects a language, the content of a .mo file is processed to find offsets of every translation. Those offsets are stored in a hash table.
when a translated string is to be displayed, the hash of the corresponding English string is calculated, and the offset of the translated string is found using that hash.

If the fhash memory in your embedded system is mapped to the address space, the English strings and the translations can be stored in the flash. Only the hash table will need to be in RAM. You'll need to reserve the size of one hash and one pointer per translated string. If you use CRC32 as a hash and 4-byte pointers, you'll need 8kB of RAM for 1024 translated strings.

If you don't have flash memory mapped to the address space, you'll have to either load a complete .mo file in the RAM when a language is selected, or call a flash IO routine every time you want to display a string.

Efficiency of gettext : in-memory translation

2 Answers2