Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
5
votes
1 answer

How to Print UTF-16 Characters in C?

i have a file containing UTF-16 characters. i read in the file and can store the characters either in a uint16_t array or a char array (any better choice?) But how do i print those characters?
Edwin Lee
  • 3,540
  • 6
  • 29
  • 36
5
votes
2 answers

Finding "actual" characters (graphemes) in a QString

Let's say I have a QString that may consist of any Unicode characters, and I want to iterate through its characters or count them. And by "characters" I mean what the user perceives as such (so roughly equivalent to "glyphs") and not simply QChars…
Sebastian Negraszus
  • 11,915
  • 7
  • 43
  • 70
5
votes
1 answer

SQL Server Management Studio - Grid Result Save As .CSV - How to output Text instead of UTF-16 (Unicode)

In SQL Server Management Studio, can the Grid "Save As" be changed to write out an encoding that is Text instead of UTF-16? When I right click a Result Grid in In SQL Server Management Studio, it allows for a Save As .CSV. Currently it saves the…
Gerhard Weiss
  • 9,343
  • 18
  • 65
  • 67
5
votes
1 answer

Java Swing - JTextField/JTextArea unable to paste supplemental unicode characters

I have done an exhaustive search of stackoverflow and Google, but I have so far been unable to find others having a similar problem. In a sample Java Swing test program, I create a plain JTextField so that I can try to paste characters into it from…
Locriansax
  • 133
  • 8
5
votes
1 answer

Encoding a UTF-16 Byte Array into a string character C# .NET

I have a byte array which I believe correctly stores a UTF-16 encoded Surrogate Pair for the unicode character Running that byte array through .Net System.Text.Encoding.Unicode.GetString() returns non-expected results. Actual results: �� Expected…
user989056
  • 1,275
  • 2
  • 15
  • 33
5
votes
1 answer

Why does `getline` on `wifstream` read garbled input from UTF-16 encoded file?

While trying to read a UTF-16 encoded file with hints from this answer, I got the problem that, after reading few thousand characters, the getline-method starts to read in garbage mojibake. Here is my main: #include #include…
5
votes
1 answer

Wrong bytes from UTF-16 encoding

I have a character '' Unicode value is U+1F62D binary equivalent is 11111011000101101 . Now I want to convert this character to byte array . My steps 1) As binary representation is bigger than 2 bytes I use 4 bytes XXXXXXXX XXXXXXX1 11110110…
Almas Abdrazak
  • 3,209
  • 5
  • 36
  • 80
5
votes
2 answers

Maximum UTF-8 string size given UTF-16 size

What is the formula for determining the maximum number of UTF-8 bytes required to encode a given number of UTF-16 code units (i.e. the value of String.Length in C# / .NET)? I see 3 possibilities: # of UTF-16 code units x 2 # of UTF-16 code units x…
Mike Marynowski
  • 3,156
  • 22
  • 32
5
votes
2 answers

How to get a character from its UTF-16 code points in Python 3?

I have a list of UTF-16 code points that I need to convert to the actual characters they represent programmatically. This seems unbelievably hard to do in Python 3. For example, I have the numbers 55357 and 56501 for one character, which I know is…
Ullallulloo
  • 1,105
  • 4
  • 16
  • 31
5
votes
3 answers

How to convert UTF-16 to and from ASCII

I'm writing a subroutine in MIPS assembly language to convert ASCII into UTF-16 and vice versa. However, I could not find any trick how to convert it.
Yunus Eren Güzel
  • 3,018
  • 11
  • 36
  • 63
5
votes
2 answers

Struggling with utf-16 encoding/decoding

I'm parsing a document that have some UTF-16 encoded string. I have a byte string that contains the following: my_var = b'\xc3\xbe\xc3\xbf\x004\x004\x000\x003\x006\x006\x000\x006\x00-\x001\x000\x000\x003\x008\x000\x006\x002\x002\x008\x005' When…
Cyril N.
  • 38,875
  • 36
  • 142
  • 243
5
votes
5 answers

What issues would come from treating UTF-16 as a fixed 16-bit encoding?

I was reading a few questions on SO about Unicode and there were some comments I didn't fully understand, like this one: Dean Harding: UTF-8 is a variable-length encoding, which is more complex to process than a fixed-length encoding. Also, see…
Danny Tuppeny
  • 40,147
  • 24
  • 151
  • 275
5
votes
3 answers

How do I accomplish random reads of a UTF8 file

My understanding is that reads to a UTF8 or UTF16 Encoded file can't necessarily be random because of the occasional surrogate byte (used in Eastern languages for example). How can I use .NET to skip to an approximate position within the file, and…
makerofthings7
  • 60,103
  • 53
  • 215
  • 448
5
votes
3 answers

What character encoding does ObjectOutputStream 's writeObject method use?

I read that Java uses UTF-16 encoding internally. i.e. I understand that if I have like: String var = "जनमत"; then the "जनमत" will be encoded in UTF-16 internally. So, If I dump this variable to some file such as below: fileOut = new…
5
votes
2 answers

In Python 3 how to print unicode codepoint as u'\U...'

For whatever reason, I thought it would be neat to create a table of emoji I'm interested in. First column would be the codepoint, second the emoji, third the name. SOmething along the lines of this web page, but tailored to my use. Full emoji…
mcwizard
  • 451
  • 5
  • 15