Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

2 answers

Converting a UTF-16LE Elixir bitstring into an Elixir String

Given an Elixir bitstring encoded in UTF-16LE: <<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>> how can I get this converted into a readable Elixir String (it spells out "Devastator")? The closest I've gotten is…

utf-8 elixir utf-16 utf-16le

asked Sep 29 '16 at 14:42

user701847

votes

2 answers

UTF-16 decoder not working as expected

I have a part of my Unicode library that decodes UTF-16 into raw Unicode code points. However, it isn't working as expected. Here's the relevant part of the code (omitting UTF-8 and string manipulation stuff): typedef struct string { unsigned…

c decoding utf-16

asked Sep 24 '10 at 13:02

Delan Azabani

79,602
28
170
210

votes

2 answers

How to best deal with Windows' 16-bit wchar_t ugliness?

I'm writing a wrapper layer to be used with mingw which provides the application with a virtual UTF-8 environment. Functions which deal with filenames are wrappers which convert from UTF-8 and call the corresponding "_w" functions, and so on. The…

c windows utf-8 mingw utf-16

asked Jul 12 '10 at 13:11

R.. GitHub STOP HELPING ICE

208,859
35
376
711

votes

1 answer

fatal error: high- and low-surrogate code points are not valid Unicode scalar values

Sometimes while initializing a UnicodeScalar with a value like 57292 yields the following error: fatal error: high- and low-surrogate code points are not valid Unicode scalar values What is this error, why does it occur and how can I prevent it in…

string swift unicode utf-16 utf

asked Aug 22 '15 at 16:34

Vatsal Manot

17,695
9
44
80

votes

3 answers

Javascript: unicode character to BYTE based hex escape sequence (NOT surrogates)

In javascript I am trying to make unicode into byte based hex escape sequences that are compatible with C: ie. becomes: \xF0\x9F\x98\x84 (correct) NOT javascript surrogates, not \uD83D\uDE04 (wrong) I cannot figure out the math relationship…

javascript unicode utf-8 hex utf-16

asked Aug 01 '15 at 12:44

ck_

3,353
5
31
33

votes

1 answer

Opening and reading UTF-16 files in Python

Recently I have been having trouble opening specific UTF-16 encoded files in Python. I have tried the following: import codecs f = codecs.open('filename.data', 'r', 'utf-16-be') contents = f.read() but I get the following error: UnicodeDecodeError:…

python encoding utf-16

asked Jul 06 '15 at 17:15

DJMcCarthy12

3,819
8
28
34

votes

3 answers

grep and tail -f for a UTF-16 binary file - trying to use simple awk

How can I achieve the equivalent of: tail -f file.txt | grep 'regexp' to only output the buffered lines that match a regular expression such as 'Result' from the file type: $ file file.txt file.txt:Little-endian UTF-16 Unicode text, with CRLF line…

awk grep cygwin utf-16 tail

asked Jun 23 '15 at 22:20

Alexander McFarlane

10,643
9
59
100

votes

2 answers

C#: how to get first character of a string?

We already have a question about getting the first 16-bit char of a string. This includes the question code: MyString.ToCharArray[0] and accepted answer code: MyString[0] I guess there are some uses for that, but when the string contains text we…

c# string unicode utf-16 surrogate-pairs

asked Apr 24 '15 at 04:17

hippietrail

15,848
18
99
158

votes

4 answers

How to reverse a string that contains surrogate pairs

I have written this method to reverse a string public string Reverse(string s) { if(string.IsNullOrEmpty(s)) return s; TextElementEnumerator enumerator = …

c# string reverse utf-16 surrogate-pairs

asked Mar 01 '14 at 13:00

Sachin Kainth

45,256
81
201
304

votes

2 answers

Why does JQuery only display HTML characters when enclosed in other tags?

I'm curious about why this JQuery renders the full block HTML character: var html = $('

█

'); $("body").append(html) But this doesn't: var html = $('█'); $("body").append(html) Is there a way to render one single special…

jquery html utf-16

asked Nov 03 '13 at 21:21

user2950747

votes

1 answer

UnicodeEncodeError: 'charmap' codec can't encode character character maps to

I have a problem with writing to file in unicode. I am using python 2.7.3. It gives me such an error: UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 1006: character maps to Here is a sample of my code:…

python python-2.7 unicode utf-8 utf-16

asked Jul 12 '13 at 11:03

yozhik

4,644
14
65
98

votes

2 answers

Will UTF-8 strings always be shorter than UTF-16?

If I have 2 strings of the same text, one UTF-8, and the other UTF-16. Is it safe to assume the UTF-8 string will always be smaller, or the same size, as the UTF-16 one? (byte wise)

text unicode encoding utf-8 utf-16

asked Jan 04 '13 at 14:56

Josh

6,046
11
52
83

votes

2 answers

VS 2012 Encoding in the declaration 'utf-16' does not match document 'utf-8'

When I open Visual Studio 2012, I am greeted with the message "Visual Studio The encoding in the declaration 'utf-16' does not match the encoding of the document 'utf-8'". Does anyone know why this might be happening? Or what troubleshooting I…

visual-studio-2012 utf-8 utf-16 file-encodings

asked Nov 19 '12 at 18:49

Ryan Gates

4,501
6
50
90

votes

2 answers

How to convert UTF8 string to UTF16

I'm getting a UTF8 string by processing a request sent by a client application. But the string is really UTF16. What can I do to get it into my local string is a letter followed by \0 character? I need to convert that String into UTF16. Sample…

java utf-8 utf-16

asked Nov 16 '12 at 07:26

dinesh707

12,106
22
84
134

votes

3 answers

How can I check for the existence of UTF-16 filenames in Perl?

I have a textfile encoded in UTF-16. Each line contains a number of columns separated by tabs. For those who care, the file is a playlist TXT export from iTunes. Column #27 contains a filename. I am reading it using Perl 5.8.8 in Linux using code…

perl utf-16

asked Aug 22 '09 at 20:13

blt04

Prev 1 2 3

…

79 80 Next