Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
10
votes
3 answers

how can I convert wstring to u16string?

I want to convert wstring to u16string in C++. I can convert wstring to string, or reverse. But I don't know how convert to u16string. u16string CTextConverter::convertWstring2U16(wstring str) { int iSize; u16string szDest[256] =…
D.A.KANG
  • 265
  • 2
  • 4
  • 12
10
votes
0 answers

Spark Read/Write (csv) ISO-8859-1

I need to read an iso-8859-1 encoded file, do some operations then save it (with iso-8859-1 encoding). To test this, I'm losely mimicking a testcase I found on the Databricks CSV…
jduff1075
  • 380
  • 1
  • 3
  • 15
10
votes
3 answers

UnicodeDecodeError on byte type

Using Python 3.4 I'm getting the following error when trying to decode a byte type using utf-32 Traceback (most recent call last): File "c:.\SharqBot.py", line 1130, in
Shariq Ali
  • 111
  • 1
  • 1
  • 4
10
votes
1 answer

Is UTF-16 compatible with UTF-8?

I asked Google the question above and was sent to Difference between UTF-8 and UTF-16? which unfortunately doesn't answer the question. From my understanding UTF-8 should be a subset of UTF-16 meaning: if my code uses UTF-16 and I hand in a UTF-8…
mike
  • 1,627
  • 1
  • 14
  • 37
10
votes
1 answer

Is there a Rust library with an UTF-16 string type? (intended for writing a Javascript interpreter)

For most programs, it's better to use UTF-8 internally and, when necessary, convert to other encodings. But in my case, I want to write a Javascript interpreter, and it's much simpler to store only UTF-16 strings (or arrays of u16), because I need…
darque
  • 1,566
  • 1
  • 14
  • 22
10
votes
8 answers

Why the Excess Memory for Strings in Delphi?

I'm reading in a large text file with 1.4 million lines that is 24 MB in size (average 17 characters a line). I'm using Delphi 2009 and the file is ANSI but gets converted to Unicode upon reading, so fairly you can say the text once converted is 48…
lkessler
  • 19,819
  • 36
  • 132
  • 203
10
votes
2 answers

Unicode case folding to upper case

I'm trying to implement a library for reading Microsoft CFB (Compound File Binary) Format files, according to the official specification of that format. The specification is available from this site. In a nutshell - some of the structures of the…
Daniel Kamil Kozar
  • 18,476
  • 5
  • 50
  • 64
10
votes
2 answers

How to convert a utf-8 string to a utf-16 string in PHP

How do I convert a utf-8 string to a utf-16 string in PHP?
Freddo411
  • 2,293
  • 3
  • 18
  • 17
10
votes
1 answer

Unicode in Python - just UTF-16?

I was happy in my Python world knowing that I was doing everything in Unicode and encoding as UTF-8 when I needed to output something to a user. Then, one of my colleagues sent me the "The UTF-8 Everywhere' manifesto" (2012) and it confused…
Endophage
  • 21,038
  • 13
  • 59
  • 90
10
votes
7 answers

Extract substring by utf-8 byte positions

I have a string and start and length with which to extract a substring. Both positions (start and length) are based on the byte offsets in the original UTF8 string. However, there is a problem: The start and length are in bytes, so I cannot use…
tofutim
  • 22,664
  • 20
  • 87
  • 148
9
votes
2 answers

java.nio.charset.MalformedInputException when reading a stream

I use the following code to read data. It throws java.nio.charset.MalformedInputException. The file I can open normally, but it does include non-ascii chars. Anyway I can fix this problem? Source.fromInputStream(stream).getLines foreach { line…
user398384
  • 1,124
  • 3
  • 14
  • 21
9
votes
2 answers

What are the consequences of storing a C# string (UTF-16) in a SQL Server nvarchar (UCS-2) column?

It seems that SQL Server uses Unicode UCS-2, a 2-byte fixed-length character encoding, for nchar/nvarchar fields. Meanwhile, C# uses Unicode UTF-16 encoding for its strings (note: Some people don't consider UCS-2 to be Unicode, but it encodes all…
Triynko
  • 18,766
  • 21
  • 107
  • 173
9
votes
0 answers

Way to make Emacs' M-x rgrep work with both UTF8 and UTF16 files?

Is it possible to customize Emacs so that rgrep would correctly find occurrences of some pattern in both UTF8 (or even Latin) and UTF16 files? I guess we should customize grep-find-template, but can't make my way through it. EDIT 2017-06-16 I do now…
user3341592
  • 1,419
  • 1
  • 17
  • 36
9
votes
4 answers

How do I encode a JavaScript string in utf-16?

In Python 3, I can do this: >>> "€13,56".encode('utf-16') b'\xff\xfe\xac 1\x003\x00,\x005\x006\x00' The input is a (unicode) string, while the output is a sequence of raw bytes of that string encoded in utf-16. How can I do the same in JavaScript -…
Claudiu
  • 224,032
  • 165
  • 485
  • 680
9
votes
3 answers

Storing UTF-8 string in a UnicodeString

In Delphi 2007 you can store a UTF-8 string in a WideString and then pass that onto a Win32 function, e.g. var UnicodeStr: WideString; UTF8Str: WideString; begin UnicodeStr:='some unicode text'; UTF8Str:=UTF8Encode(UnicodeStr); …
Mick
  • 846
  • 2
  • 7
  • 18