Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

3 answers

Templated string class use of strcmp, strcpy and strlen

I overheard sometime ago a discussion about how when creating a templated string class that you should not use strcmp, strcpy and strlen for a templated string class that can make use of UTF8 and UTF16. From what I recall, you are suppose to use…

c++ string utf-8 implementation utf-16

asked Dec 31 '11 at 06:07

mmurphy

1,327
4
15
30

votes

1 answer

Differences in string class implementations

Why are string classes implemented in several different ways and what are the advantages and disadvantages? I have seen it done several differents ways Only using a simple char (most basic way). Supporting UTF8 and UTF16 through a templated string,…

c++ string utf-8 implementation utf-16

asked Dec 27 '11 at 21:46

mmurphy

1,327
4
15
30

votes

1 answer

gnu-binutils-strings utf-8 instead of utf-16 or ascii

I've noticed gnu-binutils-strings can printout utf-16 content in a file - is it possible for the program to print out utf-8 strings? if so, which arguments are appropriate? i'm working in a python environment using subprocess and would like to work…

python unix utf-8 utf-16 binutils

asked Oct 23 '11 at 02:35

ct_

1,189
4
20
34

votes

1 answer

In Qt how do QTextCodec::codecForName("UTF-16") and codecForName("UTF-32") decide the endianness to use?

In the Qt documentation it states that (among others) the following Unicode string encodings are supported: UTF-8 UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was…

qt endianness utf-16 byte-order-mark utf-32

asked Sep 15 '11 at 11:30

Samuel Harmer

4,264
5
33
67

votes

0 answers

How to create string of 2-byte characters in MASM, UTF-16 wchar for winapi functions?

I need to create a format string for wsprintfW, so each character should be 2 bytes, UTF-16. I have Unicode String to print via WriteConsoleW, but format String has ASCII format and it doesn't actually print wchar but 1 bytes each. fstr dW "DllName:…

assembly unicode x86 masm utf-16

asked Oct 26 '22 at 14:39

mantissa

votes

1 answer

System.out.println() behavior on surrogate pair char[]

When I execute the code: char[] c1 = {'a','b'}; char[] c2 = Character.toChars(0x10437); System.out.println(c1); System.out.println(c2); I get: ab By reading the documentation of Character.toChars, I get that c2 is a char array of size 2, which is…

java char utf-16 println

asked Oct 11 '22 at 15:18

Ida

2,919
3
32
40

votes

2 answers

Must use UTF-16 Url encoding to submit a search in Java. How can I?

A certain site (which is not under my control) has an internal search engine that uses GET requests that look like: something.com/search?query=%u0001%0101, which I would like to use in my Java code . To my understanding this is a not so common way…

java url-encoding utf-16

asked Sep 06 '11 at 20:00

DannyA

1,571
2
17
28

votes

2 answers

How do you determine the byte width of a UTF-16 character?

What are the rules for reading a UTF-16 byte stream, to determine how many bytes a character takes up? I've read the standards, but based on empirical observations of real-world UTF-16 encoded streams, it looks like there are certain where the…

unicode utf-16 combining-marks ucs

asked Apr 24 '21 at 15:40

Rab

votes

2 answers

How UTF-16 encoding uses surrogate code points?

In according to the Unicode specification D91 UTF-16 encoding form: The Unicode encoding form that assigns each Unicode scalar value in the ranges U+0000..U+D7FF and U+E000..U+FFFF to a single unsigned 16-bit code unit with the same numeric value…

unicode utf-16

asked Mar 12 '21 at 18:37

Ilya Loskutov

1,967
2
20
34

votes

1 answer

Encoding conversion for large file

I am faced with a large (~ 18 GB) file, exported from SQL Server as a Unicode text file, which means its encoding is UTF-16 (little endian). The file is now stored in a computer running Linux, but I have not figured out a way to convert it to…

utf-8 large-files utf-16 iconv

asked Jul 08 '11 at 17:23

Jose L. Lykón

votes

2 answers

How do I get Delphi 2006 TStringList.LoadFromFile to load UTF-16 files

I have a Delphi 2006 app that I am adding code to process some generated CSV data files. TStringList.LoadFromFile was giving strange results and I have just worked out the files are UTF-16 encoded. Upgrading to XE is planned but not an option at…

delphi character-encoding ascii utf-16 delphi-2006

asked Jun 28 '11 at 23:42

rossmcm

5,493
10
55
118

votes

1 answer

How do I print a UTF-16 string in Zig?

I've been trying to code a UTF-16 string structure, and although the standard library provides a unicode module, it doesn't seem to provide a way to print out a slice of u16. I've tried this: const std = @import("std"); const unicode =…

unicode utf-16 zig

asked Nov 20 '20 at 20:03

Sapphire_Brick

1,560
12
26

votes

1 answer

Save BOM with File

Can someone please tell me how to save the byte order marker (BOM) with a file? For example, I save a text file now like: NSString *currentFileContent = @"This is a string of text to represent file content."; NSString *currentFileName =…

iphone utf-8 rtf utf-16 byte-order-mark

asked Jun 23 '11 at 21:05

DenVog

4,226
3
43
72

votes

2 answers

String Encoding with Emoji in Java?

I have small test example like this public class Main { public static void main(String[] args) { String s = ""; System.out.println(s); System.out.println(s.length()); …

java encoding utf-8 utf-16

asked Oct 05 '20 at 17:23

Shaw

votes

2 answers

Ruby 1.8 Iconv UTF-16 to UTF-8 fails with "\000" (Iconv::InvalidCharacter)

I am having trouble handling text files of tabulated data generated on a windows machine. I'm working in Ruby 1.8. The following gives an error ("\000" (Iconv::InvalidCharacter)) when processing the SECOND line from the file. The first line is…

ruby utf-8 character-encoding utf-16 iconv

asked May 30 '11 at 03:44

NAD

Prev 1 2 3

…

79 80 Next