Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

4 answers

Displaying UTF-16 characters on web browser

I printed some UTF-16 encoded characters and tried to display it in Firefox and it displayed it as �. So I went to Tools->Encoding and changed the encoding from UTF-8 to UTF-16 (I also tried changing charset directly in the HTML) However, when I…

html firefox utf-8 character-encoding utf-16

asked Oct 30 '12 at 05:08

allenylzhou

1,431
4
19
36

votes

4 answers

wchar_t for UTF-16 on Linux?

Does it make any sense to store UTF-16 encoded text using wchar_t* on Linux? The obvious problem is that wchar_t is four bytes on Linux and UTF-16 takes usually two (or sometimes two groups of two) bytes per character. I'm trying to use a…

c linux unicode utf-16 wchar-t

asked Oct 12 '12 at 19:09

user708549

votes

3 answers

Size of wchar_t* for surrogate pair (Unicode character out of BMP) on Windows

I have encountered an interesting issue on Windows 8. I tested I can represent Unicode characters which are out of the BMP with wchar_t* strings. The following test code produced unexpected results for me: const wchar_t* s1 = L"a"; const wchar_t* s2…

c++ windows unicode utf-16

asked Jul 16 '12 at 12:05

Mark Vincze

7,737
8
42
81

votes

7 answers

Are there any dangers to working internally in UTF-8 and then converting to UTF-16 only when needed in Windows?

Visual studio tries to insist on using tchars, which when compiled with the UNICODE option then basically ends up using the wide versions of the Windows and other API. Is there then any danger to using UTF-8 internally in the application (which…

c++ windows utf-8 cross-platform utf-16

asked Mar 08 '12 at 19:50

Carl

43,122
10
80
104

votes

1 answer

UTF-16 Encoding

Jani ALOK AshuTosh I have the XML parser which supports UTF-8 encoding only else it gives SAX parser exception. How can …

java xml utf-16

asked Feb 23 '12 at 11:50

Alok Chaudhary

3,481
1
16
19

votes

2 answers

RE2 and UTF16 (or UCS-2)

RE2 is great. Fast and deterministic. However, it supports only UTF8. My strings are natively UTF16, and converting back and forth would kill performance. How difficult would it be to implement native UTF16 capability in RE2? How difficult would it…

regex unicode utf-16 re2

asked Feb 07 '12 at 06:15

MustafaM

votes

2 answers

Can Character represent all unicode code point?

Since Java char is 16 bit long, I am wondering how can it represent the full unicode code point? It can only represent 65536 code points, is that right?

java unicode utf-16

asked Jan 07 '12 at 08:19

user705414

20,472
39
112
155

votes

1 answer

How to Convert UTF-16 to UTF-32 and Print the Resulting wchar_t in C?

i'm trying to print out a string of UTF-16 characters. i posted this question a while back and the advice given was to convert to UTF-32 using iconv and print it as a string of wchar_t. i've done some research, and managed to code the following: //…

c utf-16 iconv utf-32

asked Dec 11 '11 at 17:24

Edwin Lee

3,540
6
29
36

votes

3 answers

How can I store UTF-16 characters in a Postgres database?

I am trying to store some text (e.g. č) in a Postgres database, however when retrieving this value, it appears on screen as ?. I'm not sure why it does this, I was under the impression that it was a character that wasn't supported in UTF-8, but was…

.net postgresql encoding utf-16 surrogate-pairs

asked Dec 09 '11 at 16:29

Mr Shoubs

14,629
17
68
107

votes

1 answer

Wide character Windows

Windows defines the wchar_t symbol to be 16 bits long. However, the UTF-16 encoding used tells us that some symbols may actually be encoded with 4 bytes (32 bits). Does this mean that if I'm developing an application for Windows, the following…

c++ windows unicode utf-16

asked Dec 04 '11 at 13:39

Yippie-Ki-Yay

22,026
26
90
148

votes

2 answers

Firefox and UTF-16 encoding

I'm building a website with the encoding UTF-16. It means that every files (html,jsp) is encoded in UTF-18 and I set in the head of every HTML page : My index page is correctly…

firefox encoding utf-16

asked Nov 17 '11 at 14:59

user376112

votes

1 answer

Why was the Python Unicode internal format implemented as described in PEP 100?

http://www.python.org/dev/peps/pep-0100/ PEP 100 states that the internal format, Python Unicode, holds UTF-16 encodings, but addresses the values as UCS-2 (or UCS-4 when compiled with flag --enable-unicode=ucs4). Why wasn't UTF-16 chosen (a…

python unicode encoding utf-16 ucs2

asked Nov 05 '11 at 20:53

mkelley33

5,323
10
47
71

votes

2 answers

Error which "shouldn't happen" caused by MalformedInputException when reading file to string with UTF-16

Path file = Paths.get("New Text Document.txt"); try { System.out.println(Files.readString(file, StandardCharsets.UTF_8)); System.out.println(Files.readString(file, StandardCharsets.UTF_16)); } catch (Exception e) { …

java utf-16 java-17

asked May 05 '22 at 13:07

H.v.M.

1,348
3
16
42

votes

3 answers

Detect (or best guess of) incoming string encoding in Java

I was wondering if there are known methods to detect (or give a best guess of) the encoding of a particular string in Java. I know that you always need some additional meta-data to tell what the encoding is, and there are best practices etc., but…

java encoding utf-8 decoding utf-16

asked Jul 21 '11 at 19:07

SuPra

8,488
4
37
30

votes

2 answers

Get UTF-16 code unit at a given index in ABAP

I want to get the UTF-16 code unit at a given index in ABAP. Same can be done in JavaScript with charCodeAt(). For example "d".charCodeAt(); will give back 100. Is there a similar functionality in ABAP?

encoding abap utf-16

asked Mar 22 '21 at 21:11

schmelto

Prev 1 2 3

…

79 80 Next