Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
4
votes
0 answers

Using gperf on UTF-16 encoded input?

When moving code that uses a gperf-generated hashing function to use UTF-16 for its strings, how would you adapt/call the hashing function? The options I can see are: Convert UTF-16 to UTF-8 for the hashing. This should work out-of-the-box, but…
Christopher Creutzig
  • 8,656
  • 35
  • 45
4
votes
1 answer

Web API not able to bind model for POST with utf-16 encoded XML

I have a simple Web API controller with a POST method, that accepts an object. When the clients posts data as JSON the API works fine. Even when data is sent as XML with encoding="utf-8", the model binds seamlessly (I have added the following line…
Arghya C
  • 9,805
  • 2
  • 47
  • 66
4
votes
2 answers

Reading UTF-16 file in Inno Setup Pascal script

I have an .inf file exported from Resource Hacker. The file is in UTF-16 LE encoding. EXTRALARGELEGENDSII_INI TEXTFILE "Data.bin" LARGEFONTSLEGENDSII_INI TEXTFILE "Data_2.bin" NORMALLEGENDSII_INI TEXTFILE "Data_3.bin" THEMES_INI TEXTFILE…
Blueeyes789
  • 543
  • 6
  • 18
4
votes
1 answer

Understanding Unicode: Surrogate Blocks, Noncharacters

I am trying to actually understand the unicode standard and was poking through the xml spec where it reads: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the…
Henning
  • 579
  • 6
  • 17
4
votes
3 answers

How can I convert UTF-16 to UTF-32 in java?

I have looked for solutions, but there doesn't seem to be much on this topic. I have found solutions that suggest: String unicodeString = new String("utf8 here"); byte[] bytes = String.getBytes("UTF8"); String converted = new…
Daniel Medina Sada
  • 478
  • 1
  • 5
  • 16
4
votes
2 answers

utfcpp and Win32 wide API

Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8? I would like to use UTF8 internally, but am having…
rubenvb
  • 74,642
  • 33
  • 187
  • 332
4
votes
1 answer

How to read a user's input from the console into a Unicode string?

A C++ beginner's question. Here is what I have currently: // From tchar.h #define _T(x) __T(x) ... // From tchar.h #define __T(x) L ## x ... // In MySampleCode.h #ifdef _UNICODE #define tcout wcout #else #define tcout…
Hamish Grubijan
  • 10,562
  • 23
  • 99
  • 147
4
votes
1 answer

create UTF-16LE with BOM and CRLF line separator on Windows

I need to produce some UTF-16LE encoded files with CRLF line separators on a Windows 7 box. (Currently with a Strawberry 5.20.1) I needed to mess a long time before getting a correct output and I wonder if my solution is the correct way to do…
Seki
  • 11,135
  • 7
  • 46
  • 70
4
votes
3 answers

Advice on marshalled string that can be either ASCII or UTF-16

Welcome to unsafe land. I'm doing P/Invoke to a legacy lib that gives me a 0-terminated C-style string in the form of an unknown-length unmanaged byte buffer that can be either ASCII or UTF-16, but without giving any indication whatsoever thereof -…
4
votes
1 answer

Query MySQL with unicode char code

I have been having trouble searching through a MySQL table, trying to find entries with the character (UTF-16 code 200E) in a particular column. This particular code doesn't have a glyph, so it doesn't seem to work when I try to paste it into my…
Ben
  • 7,692
  • 15
  • 49
  • 64
4
votes
1 answer

What charset to use for json with base64 encoded binary data?

What is the most space efficient charset for JSON (UTF-8/16/32) for use of base64 encoded binary data? { data: "jA0EAwMCxamDRMfOGV5gyZPnyX1BB" }
Sebastian Barth
  • 4,079
  • 7
  • 40
  • 59
4
votes
2 answers

Reading contents from UTF-16 encoded file in Ruby

I want to read the contents of a file and save it into a variable. Normally I would do something like: text = File.read(filepath) Unfortunately there's a file I'm working with that is encoded with UTF-16LE. I've been doing some research and it…
Stew C
  • 697
  • 3
  • 10
  • 24
4
votes
1 answer

Reading a CSV w/ CFFile & Non-Roman Characters

Update: The original CSV was created in Excel; when I copied the data in to a Google Spreadsheet and downloaded a CSV from Drive, it works fine. I'm guessing there's an encoding issue w/ the Excel CSV? Is there any way to work around this w/ Excel…
shimmoril
  • 682
  • 1
  • 11
  • 22
4
votes
1 answer

UTF-16 to UTF-8 using ICU library

I wanted to convert UTF-16 strings to UTF-8. I came across the ICU library by Unicode. I am having problems doing the conversion as the default is UTF-16. I have tried using converter: UErrorCode myError = U_ZERO_ERROR; UConverter *conv =…
4
votes
2 answers

How to convert UTF-8 <-> UTF16 portable

is there a simple, portable way (win32, linux at least) to convert UTF-16 to UTF-8 and back? Preferably using boost. Thx for your help, Tobias
Tobias Langner
  • 10,634
  • 6
  • 46
  • 76