Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

3 answers

how can I convert wstring to u16string?

I want to convert wstring to u16string in C++. I can convert wstring to string, or reverse. But I don't know how convert to u16string. u16string CTextConverter::convertWstring2U16(wstring str) { int iSize; u16string szDest[256] =…

c++ winapi utf-16 wstring valueconverter

asked Mar 11 '17 at 11:38

D.A.KANG

votes

0 answers

Spark Read/Write (csv) ISO-8859-1

I need to read an iso-8859-1 encoded file, do some operations then save it (with iso-8859-1 encoding). To test this, I'm losely mimicking a testcase I found on the Databricks CSV…

scala apache-spark utf-16

asked Aug 24 '16 at 18:24

jduff1075

votes

3 answers

UnicodeDecodeError on byte type

Using Python 3.4 I'm getting the following error when trying to decode a byte type using utf-32 Traceback (most recent call last): File "c:.\SharqBot.py", line 1130, in …

python python-3.x utf-8 decode utf-16

asked Mar 21 '16 at 19:41

Shariq Ali

votes

1 answer

Is UTF-16 compatible with UTF-8?

I asked Google the question above and was sent to Difference between UTF-8 and UTF-16? which unfortunately doesn't answer the question. From my understanding UTF-8 should be a subset of UTF-16 meaning: if my code uses UTF-16 and I hand in a UTF-8…

encoding utf-8 utf-16

asked Sep 10 '15 at 10:50

mike

1,627
1
14
37

votes

1 answer

Is there a Rust library with an UTF-16 string type? (intended for writing a Javascript interpreter)

For most programs, it's better to use UTF-8 internally and, when necessary, convert to other encodings. But in my case, I want to write a Javascript interpreter, and it's much simpler to store only UTF-16 strings (or arrays of u16), because I need…

string rust utf-16

asked Jul 28 '15 at 19:23

darque

1,566
1
14
22

votes

8 answers

Why the Excess Memory for Strings in Delphi?

I'm reading in a large text file with 1.4 million lines that is 24 MB in size (average 17 characters a line). I'm using Delphi 2009 and the file is ANSI but gets converted to Unicode upon reading, so fairly you can say the text once converted is 48…

delphi memory-management delphi-2009 utf-16 fastmm

asked Nov 23 '08 at 04:38

lkessler

19,819
36
132
203

votes

2 answers

Unicode case folding to upper case

I'm trying to implement a library for reading Microsoft CFB (Compound File Binary) Format files, according to the official specification of that format. The specification is available from this site. In a nutshell - some of the structures of the…

unicode utf-16 case-folding

asked Nov 24 '13 at 21:52

Daniel Kamil Kozar

18,476
5
50
64

votes

2 answers

How to convert a utf-8 string to a utf-16 string in PHP

How do I convert a utf-8 string to a utf-16 string in PHP?

php utf-8 utf-16

asked Sep 30 '08 at 23:04

Freddo411

2,293
3
18
17

votes

1 answer

Unicode in Python - just UTF-16?

I was happy in my Python world knowing that I was doing everything in Unicode and encoding as UTF-8 when I needed to output something to a user. Then, one of my colleagues sent me the "The UTF-8 Everywhere' manifesto" (2012) and it confused…

python unicode character-encoding utf-16

asked Oct 26 '12 at 22:55

Endophage

21,038
13
59
90

votes

7 answers

Extract substring by utf-8 byte positions

I have a string and start and length with which to extract a substring. Both positions (start and length) are based on the byte offsets in the original UTF8 string. However, there is a problem: The start and length are in bytes, so I cannot use…

javascript string utf-8 character-encoding utf-16

asked Jun 26 '12 at 03:35

tofutim

22,664
20
87
148

votes

2 answers

java.nio.charset.MalformedInputException when reading a stream

I use the following code to read data. It throws java.nio.charset.MalformedInputException. The file I can open normally, but it does include non-ascii chars. Anyway I can fix this problem? Source.fromInputStream(stream).getLines foreach { line…

scala utf-8 stream decoding utf-16

asked Jul 30 '11 at 19:14

user398384

1,124
3
14
21

votes

2 answers

What are the consequences of storing a C# string (UTF-16) in a SQL Server nvarchar (UCS-2) column?

It seems that SQL Server uses Unicode UCS-2, a 2-byte fixed-length character encoding, for nchar/nvarchar fields. Meanwhile, C# uses Unicode UTF-16 encoding for its strings (note: Some people don't consider UCS-2 to be Unicode, but it encodes all…

sql-server character-encoding utf-16 ucs2 codepoint

asked Apr 13 '11 at 20:36

Triynko

18,766
21
107
173

votes

0 answers

Way to make Emacs' M-x rgrep work with both UTF8 and UTF16 files?

Is it possible to customize Emacs so that rgrep would correctly find occurrences of some pattern in both UTF8 (or even Latin) and UTF16 files? I guess we should customize grep-find-template, but can't make my way through it. EDIT 2017-06-16 I do now…

utf-8 emacs utf-16

asked May 24 '17 at 08:53

user3341592

1,419
1
17
36

votes

4 answers

How do I encode a JavaScript string in utf-16?

In Python 3, I can do this: >>> "€13,56".encode('utf-16') b'\xff\xfe\xac 1\x003\x00,\x005\x006\x00' The input is a (unicode) string, while the output is a sequence of raw bytes of that string encoded in utf-16. How can I do the same in JavaScript -…

javascript unicode encoding utf-16

asked Jun 02 '16 at 15:57

Claudiu

224,032
165
485
680

votes

3 answers

Storing UTF-8 string in a UnicodeString

In Delphi 2007 you can store a UTF-8 string in a WideString and then pass that onto a Win32 function, e.g. var UnicodeStr: WideString; UTF8Str: WideString; begin UnicodeStr:='some unicode text'; UTF8Str:=UTF8Encode(UnicodeStr); …

string delphi unicode utf-8 utf-16

asked Apr 23 '10 at 10:38

Mick

Prev 1 2 3

…

79 80 Next