Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

4 answers

What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?

Updated question ¹ With regards to character classes, comparison, sorting, normalization and collations, what Unicode version or versions are supported by which .NET platforms? Original question I remember somewhat vaguely having read that .NET…

c# .net utf-16 ucs2 astral-plane

asked Feb 06 '12 at 15:04

Abel

56,041
24
146
247

votes

3 answers

Does Unicode have a defined maximum number of code points?

I have read many articles in order to know what is the maximum number of the Unicode code points, but I did not find a final answer. I understood that the Unicode code points were minimized to make all of the UTF-8 UTF-16 and UTF-32 encodings able…

unicode utf-8 utf-16 codepoint utf-32

asked Dec 11 '14 at 05:26

user4344762

votes

2 answers

Convert UTF-16 to UTF-8 and remove BOM?

We have a data entry person who encoded in UTF-16 on Windows and would like to have utf-8 and remove the BOM. The utf-8 conversion works but BOM is still there. How would I remove this? This is what I currently…

python unicode utf-8 utf-16

asked Jan 11 '12 at 22:09

timpone

19,235
36
121
211

votes

1 answer

Valid Locale Names

How do you find valid locale names? I am currently using MAC OS X. But information about other platforms would also be useful. #include #include int main(int argc,char* argv[]) { try { std::wifstream data; …

c++ locale utf-16

asked Dec 17 '09 at 15:53

Martin York

257,169
86
333
562

votes

6 answers

Emoji value range

I was trying to take out all emoji chars out of a string (like a sanitizer). But I cannot find a complete set of emoji values. What is the complete set of emoji chars' UTF16 values?

encoding utf-16 emoji

asked May 26 '15 at 22:33

SL988

votes

2 answers

How does Java store UTF-16 characters in its 16-bit char type?

According to the Java SE 7 Specification, Java uses the Unicode UTF-16 standard to represent characters. When imagining a String as a simple array of 16-bit variables each containing one character, life is simple. Unfortunately, there are code…

java variables unicode encoding utf-16

asked Oct 28 '12 at 19:57

Kierrow

votes

5 answers

How do I encode/decode UTF-16LE byte arrays with a BOM?

I need to encode/decode UTF-16 byte arrays to and from java.lang.String. The byte arrays are given to me with a Byte Order Marker (BOM), and I need to encoded byte arrays with a BOM. Also, because I'm dealing with a Microsoft client/server, I'd like…

java unicode utf-16 byte-order-mark

asked May 18 '09 at 19:55

Jared Oberhaus

14,547
4
56
55

votes

5 answers

How to read utf16 text file to string in golang?

I can read the file to bytes array but when I convert it to string it treat the utf16 bytes as ascii How to convert it correctly? package main import ("fmt" "os" "bufio" ) func main(){ // read whole the file f, err := os.Open("test.txt") …

unicode go readline utf-16

asked Apr 03 '13 at 09:38

CL So

3,647
10
51
95

votes

3 answers

What Character Encoding is best for multinational companies

If you had a website that was to be translated into every language in the world and therefore had a database with all these translations what character encoding would be best? UTF-128? If so do all browsers understand the chosen encoding? Is…

utf-8 character-encoding utf-16 utf-32

asked Apr 20 '11 at 15:43

HGPB

4,346
8
50
86

votes

2 answers

Python - Decode UTF-16 file with BOM

I have a UTF-16 LE file with BOM. I'd like to flip this file in to UTF-8 without BOM so I can parse it using Python. The usual code that I use didn't do the trick, it returned unknown characters instead of the actual file contents. f =…

python file encoding utf-8 utf-16

asked Mar 17 '14 at 15:52

Dustin

6,207
19
61
93

votes

3 answers

Using JNA to get/set application identifier

Following up on my previous question concerning the Windows 7 taskbar, I would like to diagnose why Windows isn't acknowledging that my application is independent of javaw.exe. I presently have the following JNA code to obtain the…

java windows-7 jna utf-16

asked Dec 15 '09 at 14:13

Paul Lammertsma

37,593
16
136
187

votes

5 answers

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS…

c++ unicode utf-8 utf-16 wstring

asked Mar 27 '10 at 00:53

Dave

votes

6 answers

UnicodeDecodeError when performing os.walk

I am getting the error: 'ascii' codec can't decode byte 0x8b in position 14: ordinal not in range(128) when trying to do os.walk. The error occurs because some of the files in a directory have the 0x8b (non-utf8) character in them. The files come…

python unicode encoding utf-8 utf-16

asked Feb 14 '14 at 06:26

Scott

1,333
1
14
19

votes

3 answers

What should I use? UTF8 or UTF16?

I have to distribute my app internationally. Let's say I have a control (like a memo) where the user enters some text. The user can be Japanese, Russian, Canadian, etc. I want to save the string to disk as TXT file for later use. I will use MY OWN…

delphi utf-8 utf-16

asked Mar 22 '12 at 08:22

Gabriel

20,797
27
159
293

votes

5 answers

How to reduce memory footprint on .NET string intensive applications?

I have an application that have ~1,000,000 strings in memory for performance reasons. My application consumes ~200 MB RAM. I want to reduce the amount of memory consumed by the strings. I know .NET represents strings in UTF-16 encoding (2 byte per…

c# .net string utf-8 utf-16

asked Mar 09 '12 at 18:45

DxCK

4,402
7
50
89

Prev 1 2

…

79 80 Next