7

I'm unable to set UTF-16, or any form thereof, as locale on my Linux box. The sample code for this:

#include <iostream>
#include <string.h>
#include <locale.h>

using namespace std;

int main()
{
    char *ret = std::setlocale(LC_ALL, "en_US.utf16");
    if (ret) {
        cout << ret << endl;
    }
    return 0;
}

The output doesn't print the locale set, which means that desired locale was not set.

The list of supported locales on the box does not include any form of UTF-16 encoding. I checked this via locale -a

$ uname -a
Linux developer.com 2.6.32-279.1.1.el6.x86_64 #1 SMP Tue Jul 10 11:24:23 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Does something need to be installed to use UTF-16 on the box?

Maddy
  • 1,319
  • 3
  • 22
  • 37

1 Answers1

6

You wont be able to set UTF-16 as locale in Linux as UTF-16 is not ASCII compatible. C strings are Null terminated and as UTF-16 can contain embedded nul characters, that wont work. You need to stick to UTF-8.

If you want to generate more locales than your system currently has, have a look at /etc/locale.gen, edit this file, then run (as root) the command locale-gen to generate the newly inserted locales. But beware: even here you wont be able to generate UTF-16!

Nidhoegger
  • 4,973
  • 4
  • 36
  • 81
  • UTF-16 does NOT contain embedded nul 'character'. It does contain nul 'octets'. UTF-16 characters are 16-bits wide (two octets per character). – Doug Royer Apr 25 '20 at 22:15
  • However strings in UTF-16 do terminate with 0x0000 as it is the nul character for UTF-16. Also note that UTF-16LE and UTF-16BE exist. – Doug Royer Apr 25 '20 at 22:16
  • If you look at it from the perspective of a C-String it does contain NUL characters. And that is what I have written. – Nidhoegger Apr 27 '20 at 08:27
  • If you meant ASCII or UTF-8 string, your correct. But the question was about UTF-16, not an 8-bit string. – Doug Royer Apr 28 '20 at 19:21
  • I meant C-String in the context of Linux (which this is about). So yes. – Nidhoegger Apr 29 '20 at 08:37