Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

3 answers

Converting xml from UTF-16 to UTF-8 using PowerShell

What's the easiest way to convert XML from UTF16 to a UTF8 encoded file?

xml powershell utf-8 utf-16

asked Apr 15 '09 at 05:45

David Gardiner

16,892
20
80
117

votes

3 answers

How to convert string to unicode(UTF-8) string in Swift?

How to convert string to unicode(UTF-8) string in Swift? In Objective I could write smth like that: NSString *str = [[NSString alloc] initWithUTF8String:[strToDecode cStringUsingEncoding:NSUTF8StringEncoding]]; how to do smth similar in Swift?

ios swift unicode utf-8 utf-16

asked May 07 '16 at 10:24

Dirder

votes

5 answers

javascript and string manipulation w/ utf-16 surrogate pairs

I'm working on a twitter app and just stumbled into the world of utf-8(16). It seems the majority of javascript string functions are as blind to surrogate pairs as I was. I've got to recode some stuff to make it wide character aware. I've got this…

javascript string unicode twitter utf-16

asked Jul 30 '11 at 20:45

BentFX

2,746
5
25
30

votes

2 answers

Which Languages Does UTF-8 Not Support?

I'm working on internationalizing one of my programs for work. I'm trying to use foresight to avoid possible issues or redoing the process down the road. I see references for UTF-8, UTF-16 and UTF-32. My question is two parts: What languages does…

utf-8 internationalization c++builder utf-16 utf

asked Mar 27 '13 at 16:16

James Oravec

19,579
27
94
160

votes

1 answer

utf-16 file seeking in python. how?

For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code: f = codecs.open(ai_file, 'r', 'utf-16') seek = self.ai_map[self._cbClass.Text] #seek is valid int f.seek(seek) while…

python utf-16

asked Jul 21 '11 at 16:17

marrat

votes

5 answers

Unicode string normalization in C/C++

Am wondering how to normalize strings (containing utf-8/utf-16) in C/C++. In .NET there is a function String.Normalize . I used UTF8-CPP in the past but it does not provide such a function. ICU and Qt provide string normalization but I prefer…

c++ unicode utf-8 utf-16 unicode-normalization

asked Feb 03 '11 at 10:18

Ghassen Hamrouni

3,138
2
20
31

votes

2 answers

Which encoding does Java uses UTF-8 or UTF-16?

I've already read the following posts: What is the Java's internal represention for String? Modified UTF-8? UTF-16? https://docs.oracle.com/javase/8/docs/api/java/lang/String.html Now consider the code given below: public static void main(String[]…

java encoding utf-8 default utf-16

asked Oct 10 '16 at 09:26

Nitin Bhardwaj

votes

4 answers

Java charAt used with characters that have two code units

From Core Java, vol. 1, 9th ed., p. 69: The character ℤ requires two code units in the UTF-16 encoding. Calling String sentence = "ℤ is the set of integers"; // for clarity; not in book char ch = sentence.charAt(1) doesn't return a space but the…

java unicode utf-16 surrogate-pairs astral-plane

asked Jan 04 '13 at 03:05

Patrick Brinich-Langlois

1,381
1
15
29

votes

3 answers

Sorting the characters in a UTF-16 string in Java

TLDR Java uses two characters to represent UTF-16. Using Arrays.sort (unstable sort) messes with character sequencing. Should I convert char[] to int[] or is there a better way? Details Java represents a character as UTF-16. But the Character class…

java string sorting utf-16

asked Apr 23 '19 at 02:00

dingy

votes

4 answers

How do I compare each character of a String while accounting for characters with length > 1?

I have a variable string that might contain any unicode character. One of these unicode characters is the han . The thing is that this "han" character has "".length() == 2 but is written in the string as a single character. Considering the code…

java string unicode character-encoding utf-16

asked Jun 07 '15 at 03:40

Fagner Brack

2,365
4
33
69

votes

3 answers

UTF-16 string terminator

What is the string terminator sequence for a UTF-16 string? EDIT: Let me rephrase the question in an attempt to clarify. How's does the call to wcslen() work?

c string unicode utf-16 unicode-string

asked May 07 '11 at 20:55

Ray

votes

2 answers

Can wprintf output be properly redirected to UTF-16 on Windows?

In a C program I'm using wprintf to print Unicode (UTF-16) text in a Windows console. This works fine, but when the output of the program is redirected to a log file, the log file has a corrupted UTF-16 encoding. When redirection is done in a…

redirect encoding utf-16

asked Aug 12 '15 at 17:27

Erwin Waterlander

votes

6 answers

UTF-16 to UTF-8 conversion (for scripting in Windows)

what is the best way to convert a UTF-16 files to UTF-8? I need to use this in a cmd script.

windows utf-8 batch-file cmd utf-16

asked Nov 05 '08 at 14:54

Grzenio

35,875
47
158
240

votes

4 answers

Strange unicode characters when reading in file in node.js app

I am attempting to write a node app that reads in a set of files, splits them into lines, and puts the lines into an array. Pretty simple. It works on quite a few files except some SQL files that I am working with. For some reason I seem to be…

javascript node.js unicode utf-16 utf

asked Jan 18 '13 at 16:36

d512

32,267
28
81
107

votes

2 answers

How do I create a string with a surrogate pair inside of it?

I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair…

c# string utf-16 utf-32 surrogate-pairs

asked Jan 15 '13 at 22:06

michael

14,844
28
89
177

Prev 1 2 3

…

79 80 Next