Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes unicode code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see endianness).

Related tags

The unicode character set it serializes
Other UTFs: utf-8 utf-16, utf-32, rarely used: utf-7 utf-1 utf-18 utf-36

1193 questions

votes

5 answers

Java implicit conversion of int to byte

I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first…

java encoding casting byte utf-16

asked Dec 20 '09 at 12:29

DaveJohnston

10,031
10
54
83

votes

3 answers

Unicode BOM for UTF-16LE vs UTF32-LE

It seems like there's an ambiguity between the Byte Order Marks used for UTF16-LE and UTF-32LE. In particular, consider a file that contains the following 8 bytes: FF FE 00 00 00 00 00 00 How can I tell if this file contains: The UTF16-LE BOM…

unicode character-encoding utf-16 file-type byte-order-mark

asked Dec 18 '09 at 18:36

Edward Loper

15,374
7
43
52

votes

8 answers

Detect UTF-16 file content

Is it possible to know if a file has Unicode (16-byte per char) or 8-bit ASCII content?

file encoding utf-8 utf-16

asked Nov 21 '09 at 14:29

Franck Freiburger

26,310
20
70
95

votes

1 answer

UTF-16 to UTF-8 conversion in JavaScript

I have Base64 encoded data that is in UTF-16 I am trying to decode the data but most libraries only support UTF-8. I believe I have to drop the null bites but I am unsure how. Currently I am using David Chambbers Polyfill for Base64, but I have also…

javascript utf-8 base64 utf-16

asked Jan 29 '13 at 21:11

Don P

votes

2 answers

How to get a reliable unicode character count in Python?

Google App Engine uses Python 2.5.2, apparently with UCS4 enabled. But the GAE datastore uses UTF-8 internally. So if you store u'\ud834\udd0c' (length 2) to the datastore, when you retrieve it, you get '\U0001d10c' (length 1). I'm trying to count…

python google-app-engine unicode utf-16 utf-32

asked Aug 03 '11 at 06:26

Travis

2,961
4
22
29

votes

1 answer

Conversion from wstring to u16string and back (standard conform) in C++17 / C++20

My main platform is Windows which is the reason why I use internally UTF-16 (mostly BMP strings). I would like to use console output for these strings. Unfortunately there is no std::u16cout or std::u8cout so I need to use std::wcout. Therefore I…

c++ c++17 utf-16 wstring utf-32

asked Apr 20 '20 at 13:19

Bernd

2,113
8
22

votes

1 answer

Python3 reading mixed text/binary data line-by-line

I need to parse a file which has a UTF-16 text header and followed directly by binary data. To be able to read the binary data, I open the file in "rb" mode, then, for reading the header, wrap it into a io.TextIOWrapper(). The problem is that when I…

python python-3.x text io utf-16

asked Sep 06 '18 at 09:17

itecMemory

votes

0 answers

SonarQube - Unable to analyse xml and xsd file, with UTF-16 encoding

I'm using sonarqube (version 5.6.7) and sonar-scanner (version 3.0.3.778) for analysing some documents. Among these documents there are also .xml and .xsd files with econding UTF-16. When I launch my sonar-scanner command from command line, with…

xml xsd sonarqube utf-16 sonarqube-scan

asked Jan 03 '18 at 11:17

Nicomedes E.

1,326
5
18
27

votes

2 answers

How to force UTF-8 in node js with exec process?

I know the solution is very simple, but it's an hour I'm banging my head. In Windows 10, if i launch the command "dir", i got this result: Il volume nell'unità D non ha etichetta. in Node js i try to exec the dir command in this way: var child =…

node.js utf-8 exec utf-16

asked Oct 06 '17 at 10:17

Janka

1,908
5
20
41

votes

2 answers

Converting wstring to lower case

I want to convert wstring into lower case. I found that there are a lot of answer using locale info. Is there any function like ToLower() for wstring also?

c++ string utf-16

asked Feb 03 '17 at 10:33

msing

votes

1 answer

Truncated Read With UTF-16-Encoded Text in C++

My goal is to convert external input sources to a common, UTF-8 internal encoding, since it is compatible with many libraries I use (such as RE2) and is compact. Since I do not need to do string slicing except with pure ASCII, UTF-8 is an ideal…

c++ c++11 encoding utf-8 utf-16

asked Sep 12 '16 at 00:02

Alex Huszagh

13,272
3
39
67

votes

4 answers

Looking for a good 64 bit hash for file paths in UTF16

I have a Unicode / UTF-16 encoded path. the path delimiters is U+005C '\'. The paths are null-terminated root relative windows file system paths, e.g. "\windows\system32\drivers\myDriver32.sys" I want to hash this path into a 64-bit unsigned…

hash path collision utf-16 hash-collision

asked Sep 15 '10 at 20:12

Dominik Weber

votes

1 answer

How to convert from utf-16 to utf-32 on Linux with std library?

On MSVC converting utf-16 to utf-32 is easy - with C11's codecvt_utf16 locale facet. But in GCC (gcc (Debian 4.7.2-5) 4.7.2) seemingly this new feature hasn't been implemented yet. Is there a way to perform such conversion on Linux without iconv…

c++ gcc unicode utf-16

asked May 28 '14 at 18:46

Al Berger

1,048
14
35

votes

5 answers

Is it possible to reliably auto-decode user files to Unicode? [C#]

I have a web application that allows users to upload their content for processing. The processing engine expects UTF8 (and I'm composing XML from multiple users' files), so I need to ensure that I can properly decode the uploaded files. Since I'd…

c# string utf-8 multilingual utf-16

asked Feb 22 '10 at 20:58

NVRAM

6,947
10
41
44

votes

5 answers

Convert Short Array to String C#

Is it possible to convert short array to string, then show the text? short[] a = new short[] {0x33, 0x65, 0x66, 0xE62, 0xE63}; There are utf16 (thai characters) contains in the array. How can it output and show the thai and english words? Thank…

c# utf-16

asked Apr 04 '13 at 15:18

Fusionmate

Prev 1 2 3

…

79 80 Next