Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
4
votes
4 answers

How can I convert UTF-8 to UTF-16 in Excel VBA?

As far as I know, Excel use UTF-16 to represent string literals. I read from a console (Mac) / file (Windows), and in both cases the character encoding is messed up. I have to find a solution which works on both platforms, so ADO stream is not an…
Attila
  • 118
  • 1
  • 8
4
votes
3 answers

Is it possible to set a text file to UTF-16?

My code for writing text works for ANSI characters, but when I try to write Japanese characters they do not appear. Do I need to use UTF-16 encoding? If so, how would I do it on code? std::wstring filename; std::wstring text; filename =…
4
votes
0 answers

How to make char16_t acceptable as a template parameter to basic_ifstream?

I am using C++17 on macOS and char16_t is not acceptable as a template parameter as follow: basic_ifstream file("c:\\file.txt", ios_base::ate); streamsize size = file.tellg(); file.seekg(0, ios_base::beg); u16string str(size/2,…
Lion King
  • 32,851
  • 25
  • 81
  • 143
4
votes
2 answers

Boost libraries for UTF-16 strings?

Are there any boost libraries to help with UTF-16 (or higher) strings?
Paul Manta
  • 30,618
  • 31
  • 128
  • 208
4
votes
1 answer

Is UTF-16 a superset of ASCII? If yes, why is UTF-16 incompatible with ASCII according to the HTML Standard?

According to the Wikipedia article on UTF-16, "...[UTF-16] is also the only web-encoding incompatible with ASCII." (at the end of the abstract.) This statement refers to the HTML Standard. Is this a wrong statement? I'm mainly a C# / .NET dev, and…
feO2x
  • 5,358
  • 2
  • 37
  • 46
4
votes
3 answers

How can I get the hex value of an input string using C++?

I just started working with C++, after a few weeks I figured out that C++ doesn't support a method or library to convert a string to Hexa value. Currently, I'm working on a method that will return the hexadecimal value of an input string encode in…
Nguyễn Đức Tâm
  • 1,017
  • 2
  • 10
  • 24
4
votes
1 answer

Why Unicode code points are always written with at least 2 bytes?

Why does Unicode code points are always written with 2 bytes (4 digits) even when that's not necessary ? From the Wikipedia page about UTF-8 : $ -> U+0024 ¢ -> U+00A2
Radioreve
  • 3,173
  • 3
  • 19
  • 32
4
votes
3 answers

Split UTF-16 String into single chars/strings

I have string that looks like this abc and I want to split it to single chars/strings. static List split(String text ) { List list = new ArrayList<>(text.length()); for(int i = 0; i < text.length() ; i++) { …
MAGx2
  • 3,149
  • 7
  • 33
  • 63
4
votes
2 answers

Reading UTF-16 file in c++

I'm trying to read a file which has UTF-16LE coding with BOM. I tried this code #include #include #include #include int main() { std::wifstream fin("/home/asutp/test"); …
Kot Shrodingera
  • 85
  • 1
  • 4
  • 12
4
votes
2 answers

How can I use Mac OS X (and UNIX) command line tools like grep with UTF-16 files?

I have a bunch of text files I want to use with grep. They are all from an external source and are UTF-16 encoded and begin with a byte order mark. Unix tools like grep don't work on them for me. What work-around is there for this?
Steve McLeod
  • 51,737
  • 47
  • 128
  • 184
4
votes
1 answer

Why does Windows use ANSI Code page instead of UNICODE?

When I run the command chcp in a cmd.exe window, it represents the code page used in Windows. I think Windows uses the UNICODE character set. So, my questions are: Why does Windows use ANSI codepages instead of Unicode? Windows uses UTF-16 or…
JaeHyeok Kim
  • 103
  • 5
4
votes
3 answers

length of string in python3.5 with different encode

I tried this in python to get the length of a string in bytes. >>> s = 'a' >>> s.encode('utf-8') b'a' >>> s.encode('utf-16') b'\xff\xfea\x00' >>> s.encode('utf-32') b'\xff\xfe\x00\x00a\x00\x00\x00' >>> len(s.encode('utf-8')) 1 >>>…
Z-Jiang
  • 189
  • 2
  • 10
4
votes
0 answers

"UnicodeError: UTF-16 stream does not start with BOM" when opening file that apparently has a BOM

I have a project in which most of the files are UTF-16 but one is UTF-8. Having put the correct encoding ("utf_8" or "utf_16") into strOpenEncoding, I tried this: for strInput in open(strInputFileName, "r", newline="\n",…
Stephen
  • 61
  • 1
  • 6
4
votes
2 answers

What is a safe length of JavaScript strings?

Considering charAt(), charCodeAt(), and codePointAt() I find a discrepancy between what the parameter means. Before I really thought about it I thought you would always be safe to access the character at length-1. But I read the difference between…
Clive
  • 269
  • 1
  • 14
4
votes
2 answers

UCS2 vs UTF. What languages can not be displayed in the UCS2 encoding?

UCS2 easier to use in Visual C++, than UTF encoding. What languages I can not support in UCS2 encoding?
KindDragon
  • 6,558
  • 4
  • 47
  • 75