Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
7
votes
1 answer

Escaping unicode characters with C/C++

I need to escape unicode characters within a input string to either UTF-16 or UTF-32 escape sequences. For example, the input string literal "Eat, drink, 愛" should be escaped as "Eat, drink, \u611b". Here are the rules in a table of sorts: Escape |…
user152949
7
votes
3 answers

Defining 4-byte UTF-16 character in a string

I have read a question about UTF-8, UTF-16 and UCS-2 and almost all answers give the statement that UCS-2 is obsolete and C# uses UTF-16. However, all my attempts to create the 4-byte character U+1D11E in C# failed, so I actually think C# uses the…
Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
7
votes
1 answer

XML Spec and UTF-16

Section 4.3.3 and Appendix F of the XML 1.0 spec speak about UTF-16, the byte order mark (BOM) in UTF-16 encoded data streams, and the XML encoding declaration. From the information in those sections, it would seem that a byte order mark is…
Mike Menzel
  • 583
  • 2
  • 12
7
votes
2 answers

how can I use linux command sed to process Little-endian UTF-16 file

I am working on an application about windows rdp. Now I get a problem when I try to use the sed command to replace the string of IP address directly in the rdp file. But after executing this command, the origin rdp file is garbled. sed -i…
liuan
  • 299
  • 1
  • 3
  • 9
7
votes
2 answers

Tcl for getting ASCII code for every character in a string

I need to get the ASCII character for every character in a string. Actually its every character in a (small) file. The following first 3 lines successfully pull all a file's contents into a string (per this recipe): set fp [open…
Dexygen
  • 12,287
  • 13
  • 80
  • 147
7
votes
1 answer

How to force UTF-16 while reading/writing in Java?

I see that you can specify UTF-16 as the charset via Charset.forName("UTF-16"), and that you can create a new UTF-16 decoder via Charset.forName("UTF-16").newDecoder(), but I only see the ability to specify a CharsetDecoder on InputStreamReader's…
IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756
7
votes
1 answer

How to convert a utf16 ushort array to a utf8 std::string?

Currently I'm writing a plugin which is just a wrapper around an existing library. The plugin's host passes to me an utf-16 formatted string defined as following typedef unsigned short PA_Unichar; And the wrapped library accepts only a const char*…
Robotex
  • 105
  • 1
  • 2
  • 10
6
votes
5 answers

Convert wchar_t* to UTF-16 string

I need a code in C++ to convert a string given in wchar_t* to a UTF-16 string. It must work both on Windows and Linux. I've looked through a lot of web-pages during the search, but the subject still is not clear to me. As I understand I need…
Andrei Baskakov
  • 161
  • 2
  • 3
6
votes
2 answers

UCS-2 and SQL Server

While researching options for storing mostly-English-but-sometimes-not data in a SQL Server database that can potentially be quite large, I'm leaning toward storing most string data as UTF-8 encoded. However, Microsoft chose UCS-2 for reasons that I…
Eric J.
  • 147,927
  • 63
  • 340
  • 553
6
votes
5 answers

grep unicode 16 support

I use TextEdit on macosx created two files, same contents with different encodings, then grep xxx filename_UTF-16 nothing grep xxx filename_UTF-8 xxxxxxx xxxxxxyyyyyy grep did not support UTF-16?
toughtalker
  • 461
  • 2
  • 6
  • 14
6
votes
4 answers

What should I know to make my I18N application work in Japanese?

I'm working on a I18N application which will be located in Japanese, I don't know any word in Japanese, and I'm first wondering if utf8 is enough for that language. Usually, for European language, utf8 is enough, and I've to set up my database…
Boris Guéry
  • 47,316
  • 8
  • 52
  • 87
6
votes
3 answers

C: Most efficient way to determine how many bytes will be needed for a UTF-16 string from a UTF-8 string

I've seen some very clever code out there for converting between Unicode codepoints and UTF-8 so I was wondering if anybody has (or would enjoy devising) this. Given a UTF-8 string, how many bytes are needed for the UTF-16 encoding of the same…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
6
votes
1 answer

Perl6 NativeCall with Str is encoded('utf16') got randomly corrupted result

I am mapping the GetFullPathName windows API in a perl6 script using NativeCall, for so I wrote the following: #!perl6 use NativeCall; constant \WIN32_MAX_PATH = 260; #I may use directly $path.IO.absolute() sub Win32-GetFullPathName( …
xlat
  • 190
  • 8
6
votes
5 answers

Total number of UTF16 Characters

Can you calculate that a UTF16 Encoding represents 1,112,064 numbers by permuations/commbinations?
user4344
  • 661
  • 3
  • 8
  • 16
6
votes
1 answer

Remove accents in string except "ñ"

I have the following example code: var inputString = "ñaáme"; inputString = inputString.Replace('ñ', '\u00F1'); var normalizedString = inputString.Normalize(NormalizationForm.FormD); var result = Regex.Replace(normalizedString, @"[^ñÑa-zA-Z0-9\s]*",…
HenryGuillen17
  • 370
  • 1
  • 3
  • 13