Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
3
votes
3 answers

Templated string class use of strcmp, strcpy and strlen

I overheard sometime ago a discussion about how when creating a templated string class that you should not use strcmp, strcpy and strlen for a templated string class that can make use of UTF8 and UTF16. From what I recall, you are suppose to use…
mmurphy
  • 1,327
  • 4
  • 15
  • 30
3
votes
1 answer

Differences in string class implementations

Why are string classes implemented in several different ways and what are the advantages and disadvantages? I have seen it done several differents ways Only using a simple char (most basic way). Supporting UTF8 and UTF16 through a templated string,…
mmurphy
  • 1,327
  • 4
  • 15
  • 30
3
votes
1 answer

gnu-binutils-strings utf-8 instead of utf-16 or ascii

I've noticed gnu-binutils-strings can printout utf-16 content in a file - is it possible for the program to print out utf-8 strings? if so, which arguments are appropriate? i'm working in a python environment using subprocess and would like to work…
ct_
  • 1,189
  • 4
  • 20
  • 34
3
votes
1 answer

In Qt how do QTextCodec::codecForName("UTF-16") and codecForName("UTF-32") decide the endianness to use?

In the Qt documentation it states that (among others) the following Unicode string encodings are supported: UTF-8 UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE Due to the three different codecs listed for 2 and 4 octet encoded Unicode, I was…
Samuel Harmer
  • 4,264
  • 5
  • 33
  • 67
3
votes
0 answers

How to create string of 2-byte characters in MASM, UTF-16 wchar for winapi functions?

I need to create a format string for wsprintfW, so each character should be 2 bytes, UTF-16. I have Unicode String to print via WriteConsoleW, but format String has ASCII format and it doesn't actually print wchar but 1 bytes each. fstr dW "DllName:…
mantissa
  • 132
  • 7
3
votes
1 answer

System.out.println() behavior on surrogate pair char[]

When I execute the code: char[] c1 = {'a','b'}; char[] c2 = Character.toChars(0x10437); System.out.println(c1); System.out.println(c2); I get: ab By reading the documentation of Character.toChars, I get that c2 is a char array of size 2, which is…
Ida
  • 2,919
  • 3
  • 32
  • 40
3
votes
2 answers

Must use UTF-16 Url encoding to submit a search in Java. How can I?

A certain site (which is not under my control) has an internal search engine that uses GET requests that look like: something.com/search?query=%u0001%0101, which I would like to use in my Java code . To my understanding this is a not so common way…
DannyA
  • 1,571
  • 2
  • 17
  • 28
3
votes
2 answers

How do you determine the byte width of a UTF-16 character?

What are the rules for reading a UTF-16 byte stream, to determine how many bytes a character takes up? I've read the standards, but based on empirical observations of real-world UTF-16 encoded streams, it looks like there are certain where the…
Rab
  • 445
  • 4
  • 11
3
votes
2 answers

How UTF-16 encoding uses surrogate code points?

In according to the Unicode specification D91 UTF-16 encoding form: The Unicode encoding form that assigns each Unicode scalar value in the ranges U+0000..U+D7FF and U+E000..U+FFFF to a single unsigned 16-bit code unit with the same numeric value…
Ilya Loskutov
  • 1,967
  • 2
  • 20
  • 34
3
votes
1 answer

Encoding conversion for large file

I am faced with a large (~ 18 GB) file, exported from SQL Server as a Unicode text file, which means its encoding is UTF-16 (little endian). The file is now stored in a computer running Linux, but I have not figured out a way to convert it to…
3
votes
2 answers

How do I get Delphi 2006 TStringList.LoadFromFile to load UTF-16 files

I have a Delphi 2006 app that I am adding code to process some generated CSV data files. TStringList.LoadFromFile was giving strange results and I have just worked out the files are UTF-16 encoded. Upgrading to XE is planned but not an option at…
rossmcm
  • 5,493
  • 10
  • 55
  • 118
3
votes
1 answer

How do I print a UTF-16 string in Zig?

I've been trying to code a UTF-16 string structure, and although the standard library provides a unicode module, it doesn't seem to provide a way to print out a slice of u16. I've tried this: const std = @import("std"); const unicode =…
Sapphire_Brick
  • 1,560
  • 12
  • 26
3
votes
1 answer

Save BOM with File

Can someone please tell me how to save the byte order marker (BOM) with a file? For example, I save a text file now like: NSString *currentFileContent = @"This is a string of text to represent file content."; NSString *currentFileName =…
DenVog
  • 4,226
  • 3
  • 43
  • 72
3
votes
2 answers

String Encoding with Emoji in Java?

I have small test example like this public class Main { public static void main(String[] args) { String s = ""; System.out.println(s); System.out.println(s.length()); …
Shaw
  • 105
  • 1
  • 1
  • 6
3
votes
2 answers

Ruby 1.8 Iconv UTF-16 to UTF-8 fails with "\000" (Iconv::InvalidCharacter)

I am having trouble handling text files of tabulated data generated on a windows machine. I'm working in Ruby 1.8. The following gives an error ("\000" (Iconv::InvalidCharacter)) when processing the SECOND line from the file. The first line is…
NAD
  • 615
  • 1
  • 7
  • 20