Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
56
votes
9 answers

How to solve "unable to switch the encoding" error when inserting XML into SQL Server

I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining: System.Data.SqlClient.SqlException (0x80131904): XML parsing: line 1, character 39, unable to switch the encoding I found out that the XML column has to be…
veljkoz
  • 8,384
  • 8
  • 55
  • 91
55
votes
7 answers

What is the Java's internal represention for String? Modified UTF-8? UTF-16?

I searched Java's internal representation for String, but I've got two materials which look reliable but inconsistent. One is: http://www.codeguru.com/cpp/misc/misc/multi-lingualsupport/article.php/c10451 and it says: Java uses UTF-16 for the…
Johnny Lim
  • 5,623
  • 8
  • 38
  • 53
45
votes
8 answers

Why does the Java char primitive take up 2 bytes of memory?

Is there any reason why Java char primitive data type is 2 bytes unlike C which is 1 byte? Thanks
realnumber
  • 2,124
  • 5
  • 25
  • 33
41
votes
6 answers

JavaScript strings outside of the BMP

BMP being Basic Multilingual Plane According to JavaScript: the Good Parts: JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide. This leads me to believe that JavaScript uses…
Delan Azabani
  • 79,602
  • 28
  • 170
  • 210
37
votes
3 answers

Enter Unicode characters with 8-digit hex code

How do I enter Unicode characters like without copying it to the clipboard and pasting it? Things I know: The command ga on the character gives me hex:0001d4ed. I can copy it on the clipboard and paste it via "+p. I know how to enter Unicode…
epsilonhalbe
  • 15,637
  • 5
  • 46
  • 74
37
votes
5 answers

Confusing sizeof(char) by ISO/IEC in different character set encoding like UTF-16

Assuming that a program is running on a system with UTF-16 encoding character set. So according to The C++ Programming Language - 4th, page 150: A char can hold a character of the machine’s character set. → I think that a char variable will have…
kembedded
  • 515
  • 3
  • 11
35
votes
3 answers

What is the Unicode U+001A Character? Aka 0x1A

The U+001A character appears frequently in error messages relating to character encoding. What is the U+001A character?
KevSheedy
  • 3,195
  • 4
  • 22
  • 26
34
votes
11 answers

Convert UTF-16 to UTF-8 under Windows and Linux, in C

I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment? I've managed to google few references to 'iconv' , but…
DooriBar
  • 407
  • 1
  • 4
  • 8
33
votes
5 answers

UTF8 vs. UTF16 vs. char* vs. what? Someone explain this mess to me!

I've managed to mostly ignore all this multi-byte character stuff, but now I need to do some UI work and I know my ignorance in this area is going to catch up with me! Can anyone explain in a few paragraphs or less just what I need to know so that I…
dicroce
  • 45,396
  • 28
  • 101
  • 140
31
votes
2 answers

Utf8_general_ci or utf8mb4 or...?

utf16 or utf32? I'm trying to store content in a lot of languages. Some of the languages use double-wide fonts (for example, Japanese fonts are frequently twice as wide as English fonts). I'm not sure which kind of database I should be using. …
Wolfpack'08
  • 3,982
  • 11
  • 46
  • 78
30
votes
7 answers

Is there any reason to prefer UTF-16 over UTF-8?

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for…
Oak
  • 26,231
  • 8
  • 93
  • 152
29
votes
2 answers

Why does Java char use UTF-16?

I have been reading about how Unicode code points have evolved over time, including this article by Joel Spolsky, which says: Some people are under the misconception that Unicode is simply a 16-bit code where each character takes 16 bits and…
FZE
  • 1,587
  • 12
  • 35
28
votes
5 answers

JavaScript strings - UTF-16 vs UCS-2?

I've read in some places that JavaScript strings are UTF-16, and in other places they're UCS-2. I did some searching around to try to figure out the difference and found this: Q: What is the difference between UCS-2 and UTF-16? A: UCS-2 is obsolete…
patorjk
  • 2,164
  • 1
  • 20
  • 30
28
votes
8 answers

What's the best way to export UTF8 data into Excel?

So we have this web app where we support UTF8 data. Hooray UTF8. And we can export the user-supplied data into CSV no problem - it's still in UTF8 at that point. The problem is when you open a typical UTF8 CSV up in Excel, it reads it as ANSII…
Billy Gray
  • 1,747
  • 4
  • 18
  • 23
28
votes
2 answers

How can I read a file encoded in utf-16 in nodejs?

I have to read a file encoded in UTF-16 using nodejs (in chunks because it is very large). The data from the file will go into a mongodb, so I will need to convert it into utf-8. From googling, it seems that this is just plain not supported by Node,…
Ryan Ballantyne
  • 4,064
  • 3
  • 26
  • 27
1
2
3
79 80