Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

1 answer

How to write 3 bytes unicode literal in Java?

I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I I tried with '\u10428' and it doesn't compile.

java unicode utf-16 utf-32 unicode-literals

asked Jul 08 '14 at 13:35

kawty

1,656
15
22

votes

6 answers

Writing utf16 to file in binary mode

I'm trying to write a wstring to file with ofstream in binary mode, but I think I'm doing something wrong. This is what I've tried: ofstream outFile("test.txt", std::ios::out | std::ios::binary); wstring hello = L"hello"; outFile.write((char *)…

c++ unicode utf-16

asked Oct 16 '08 at 07:17

Cactuar

votes

3 answers

Encode/Decode std::string to UTF-16

I have to handle a file format (both read from and write to it) in which strings are encoded in UTF-16 (2 bytes per character). Since characters out of the ASCII table are rarely used in the application domain, all of the strings in my C++ model…

c++ utf-16 stdstring

asked Jun 18 '12 at 15:37

Peter

votes

7 answers

Is there a standard technique for packing binary data into a UTF-16 string?

(In .NET) I have arbitrary binary data stored in in a byte[] (an image, for example). Now, I need to store that data in a string (a "Comment" field of a legacy API). Is there a standard technique for packing this binary data into a string? By…

.net unicode encoding binary utf-16

asked Mar 15 '09 at 00:01

Ðаn

10,934
11
59
95

votes

2 answers

How was the position of the Surrogates Area (UTF-16) chosen?

Was the position of UTF-16 surrogates area (U+D800..U+DFFF) chosen at random or does it have some logical reason, that it is on this place?

unicode utf-16

asked Mar 03 '11 at 08:21

sid_com

24,137
26
96
187

votes

3 answers

UTF-16 Encoding in Java versus C#

I am trying to read a String in UTF-16 encoding scheme and perform MD5 hashing on it. But strangely, Java and C# are returning different results when I try to do it. The following is the piece of code in Java: public static void main(String[] args)…

c# java encoding md5 utf-16

asked Jan 25 '11 at 12:19

rkg

5,559
8
37
50

votes

4 answers

std::wstring length

What is the result of std::wstring.length() function, the length in wchar_t(s) or the length in symbols? And why? TCHAR r2[3]; r2[0] = 0xD834; // D834, DD1E - musical G clef r2[1] = 0xDD1E; // r2[2] = 0x0000; // '/0' std::wstring r =…

c++ string encoding std utf-16

asked Nov 15 '10 at 11:12

Julian Popov

17,401
12
55
81

votes

5 answers

How does Microsoft handle the fact that UTF-16 is a variable length encoding in their C++ standard library implementation

Having a variable length encoding is indirectly forbidden in the standard. So I have several questions: How is the following part of the standard handled? 17.3.2.1.3.3 Wide-character sequences A wide-character sequence is an array object (8.3.4) A…

c++ utf-16

asked Oct 26 '10 at 15:54

Šimon Tóth

35,456
20
106
151

votes

4 answers

Is there a drastic difference between UTF-8 and UTF-16

I call a webservice, that gives me back a response xml that has UTF-8 encoding. I checked that in java using getAllHeaders() method. Now, in my java code, I take that response and then do some processing on it. And later, pass it on to a different…

java xml utf-8 character-encoding utf-16

asked Mar 14 '14 at 12:04

Kraken

23,393
37
102
162

votes

2 answers

Why does Powershell file concatenation convert UTF8 to UTF16?

I am running the following Powershell script to concatenate a series of output files into a single CSV file. whidataXX.htm (where xx is a two digit sequential number) and the number of files created varies from run to run. $metadataPath =…

powershell utf-8 utf-16 data-conversion

asked Oct 15 '13 at 18:22

dwwilson66

6,806
27
72
117

votes

7 answers

Dummy's guide to Unicode

Could anyone give me a concise definitions of Unicode UTF7 UTF8 UTF16 UTF32 Codepages How they differ from Ascii/Ansi/Windows 1252 I'm not after wikipedia links or incredible detail, just some brief information on how and why the huge variations…

unicode utf-8 utf-16 codepages

asked Sep 21 '09 at 14:58

Arec Barrwin

61,343
9
29
25

votes

4 answers

Does the Unicode Consortium Intend to make UTF-16 run out of characters?

The current version of UTF-16 is only capable of encoding 1,112,064 different numbers(code points); 0x0-0x10FFFF. Does the Unicode Consortium Intend to make UTF-16 run out of characters? i.e. make a code point > 0x10FFFF If not, why would anyone…

unicode utf-8 utf-16

asked Feb 21 '12 at 19:47

GlassGhost

16,906
5
32
45

votes

2 answers

Is the XML declaration tag case sensitive?

I have what is probably a really simple, studid question but I can't find an answer to it anywhere and I need to be pretty sure about this. I have various XML files from various vendors. One of the vendors provide me an XML file with japanese…

xml encoding utf-16

asked May 28 '09 at 15:35

Frank V

25,141
34
106
144

votes

4 answers

Python UTF-16 CSV reader

I have a UTF-16 CSV file which I have to read. Python csv module does not seem to support UTF-16. I am using python 2.7.2. CSV files I need to parse are huge size running into several GBs of data. Answers for John Machin questions below print…

python csv utf-16

asked Feb 07 '12 at 14:16

venky

votes

1 answer

How can I match emoji with an R regex?

I want to determine which elements of my vector contain emoji: x = c('', 'no', '', '', 'no', '', '䨺', '감사') x # [1] "\U0001f602" "no" "\U0001f379" "\U0001f600" "no" "\U0001f61b" "䨺" "감사" Related posts only cover other…

r regex emoji utf-16

asked Apr 12 '17 at 02:12

MichaelChirico

33,841
14
113
198

Prev 1 2 3

…

79 80 Next