Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

1 answer

Escaping unicode characters with C/C++

I need to escape unicode characters within a input string to either UTF-16 or UTF-32 escape sequences. For example, the input string literal "Eat, drink, 愛" should be escaped as "Eat, drink, \u611b". Here are the rules in a table of sorts: Escape |…

c++ unicode utf-16 utf-32

asked May 24 '14 at 10:10

user152949

votes

3 answers

Defining 4-byte UTF-16 character in a string

I have read a question about UTF-8, UTF-16 and UCS-2 and almost all answers give the statement that UCS-2 is obsolete and C# uses UTF-16. However, all my attempts to create the 4-byte character U+1D11E in C# failed, so I actually think C# uses the…

c# unicode encoding character-encoding utf-16

asked Jan 01 '14 at 23:38

Thomas Weller

55,411
20
125
222

votes

1 answer

XML Spec and UTF-16

Section 4.3.3 and Appendix F of the XML 1.0 spec speak about UTF-16, the byte order mark (BOM) in UTF-16 encoded data streams, and the XML encoding declaration. From the information in those sections, it would seem that a byte order mark is…

xml unicode w3c utf-16 specifications

asked Dec 19 '13 at 21:55

Mike Menzel

votes

2 answers

how can I use linux command sed to process Little-endian UTF-16 file

I am working on an application about windows rdp. Now I get a problem when I try to use the sed command to replace the string of IP address directly in the rdp file. But after executing this command, the origin rdp file is garbled. sed -i…

shell utf-16 endianness

asked Jul 19 '13 at 13:53

liuan

votes

2 answers

Tcl for getting ASCII code for every character in a string

I need to get the ASCII character for every character in a string. Actually its every character in a (small) file. The following first 3 lines successfully pull all a file's contents into a string (per this recipe): set fp [open…

string list ascii tcl utf-16

asked Nov 04 '09 at 18:15

Dexygen

12,287
13
80
147

votes

1 answer

How to force UTF-16 while reading/writing in Java?

I see that you can specify UTF-16 as the charset via Charset.forName("UTF-16"), and that you can create a new UTF-16 decoder via Charset.forName("UTF-16").newDecoder(), but I only see the ability to specify a CharsetDecoder on InputStreamReader's…

java file-io character-encoding utf-16

asked Feb 26 '13 at 20:02

IAmYourFaja

55,468
181
466
756

votes

1 answer

How to convert a utf16 ushort array to a utf8 std::string?

Currently I'm writing a plugin which is just a wrapper around an existing library. The plugin's host passes to me an utf-16 formatted string defined as following typedef unsigned short PA_Unichar; And the wrapped library accepts only a const char*…

c++ utf-8 c++11 utf-16

asked Dec 15 '12 at 09:17

Robotex

votes

5 answers

Convert wchar_t* to UTF-16 string

I need a code in C++ to convert a string given in wchar_t* to a UTF-16 string. It must work both on Windows and Linux. I've looked through a lot of web-pages during the search, but the subject still is not clear to me. As I understand I need…

c++ c unicode utf-16 wchar-t

asked Mar 14 '12 at 06:51

Andrei Baskakov

votes

2 answers

UCS-2 and SQL Server

While researching options for storing mostly-English-but-sometimes-not data in a SQL Server database that can potentially be quite large, I'm leaning toward storing most string data as UTF-8 encoded. However, Microsoft chose UCS-2 for reasons that I…

sql-server unicode utf-8 utf-16 ucs2

asked Jan 25 '12 at 18:22

Eric J.

147,927
63
340
553

votes

5 answers

grep unicode 16 support

I use TextEdit on macosx created two files, same contents with different encodings, then grep xxx filename_UTF-16 nothing grep xxx filename_UTF-8 xxxxxxx xxxxxxyyyyyy grep did not support UTF-16?

linux unicode utf-8 grep utf-16

asked Jul 30 '11 at 08:45

toughtalker

votes

4 answers

What should I know to make my I18N application work in Japanese?

I'm working on a I18N application which will be located in Japanese, I don't know any word in Japanese, and I'm first wondering if utf8 is enough for that language. Usually, for European language, utf8 is enough, and I've to set up my database…

php utf-8 internationalization gettext utf-16

asked Jun 01 '11 at 10:02

Boris Guéry

47,316
8
52
87

votes

3 answers

C: Most efficient way to determine how many bytes will be needed for a UTF-16 string from a UTF-8 string

I've seen some very clever code out there for converting between Unicode codepoints and UTF-8 so I was wondering if anybody has (or would enjoy devising) this. Given a UTF-8 string, how many bytes are needed for the UTF-16 encoding of the same…

c algorithm utf-8 utf-16 unicode-string

asked Apr 20 '11 at 09:16

hippietrail

15,848
18
99
158

votes

1 answer

Perl6 NativeCall with Str is encoded('utf16') got randomly corrupted result

I am mapping the GetFullPathName windows API in a perl6 script using NativeCall, for so I wrote the following: #!perl6 use NativeCall; constant \WIN32_MAX_PATH = 260; #I may use directly $path.IO.absolute() sub Win32-GetFullPathName( …

utf-16 raku nativecall

asked Dec 09 '18 at 10:57

xlat

votes

5 answers

Total number of UTF16 Characters

Can you calculate that a UTF16 Encoding represents 1,112,064 numbers by permuations/commbinations?

unicode character-encoding utf-16

asked Feb 13 '11 at 12:24

user4344

votes

1 answer

Remove accents in string except "ñ"

I have the following example code: var inputString = "ñaáme"; inputString = inputString.Replace('ñ', '\u00F1'); var normalizedString = inputString.Normalize(NormalizationForm.FormD); var result = Regex.Replace(normalizedString, @"[^ñÑa-zA-Z0-9\s]*",…

c# regex normalization utf-16

asked Nov 25 '17 at 17:03

HenryGuillen17

Prev 1 2 3

…

79 80 Next