Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
27
votes
4 answers

What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?

Updated question ¹ With regards to character classes, comparison, sorting, normalization and collations, what Unicode version or versions are supported by which .NET platforms? Original question I remember somewhat vaguely having read that .NET…
Abel
  • 56,041
  • 24
  • 146
  • 247
27
votes
3 answers

Does Unicode have a defined maximum number of code points?

I have read many articles in order to know what is the maximum number of the Unicode code points, but I did not find a final answer. I understood that the Unicode code points were minimized to make all of the UTF-8 UTF-16 and UTF-32 encodings able…
user4344762
26
votes
2 answers

Convert UTF-16 to UTF-8 and remove BOM?

We have a data entry person who encoded in UTF-16 on Windows and would like to have utf-8 and remove the BOM. The utf-8 conversion works but BOM is still there. How would I remove this? This is what I currently…
timpone
  • 19,235
  • 36
  • 121
  • 211
26
votes
1 answer

Valid Locale Names

How do you find valid locale names? I am currently using MAC OS X. But information about other platforms would also be useful. #include #include int main(int argc,char* argv[]) { try { std::wifstream data; …
Martin York
  • 257,169
  • 86
  • 333
  • 562
25
votes
6 answers

Emoji value range

I was trying to take out all emoji chars out of a string (like a sanitizer). But I cannot find a complete set of emoji values. What is the complete set of emoji chars' UTF16 values?
SL988
  • 275
  • 1
  • 3
  • 8
25
votes
2 answers

How does Java store UTF-16 characters in its 16-bit char type?

According to the Java SE 7 Specification, Java uses the Unicode UTF-16 standard to represent characters. When imagining a String as a simple array of 16-bit variables each containing one character, life is simple. Unfortunately, there are code…
Kierrow
  • 685
  • 1
  • 7
  • 14
24
votes
5 answers

How do I encode/decode UTF-16LE byte arrays with a BOM?

I need to encode/decode UTF-16 byte arrays to and from java.lang.String. The byte arrays are given to me with a Byte Order Marker (BOM), and I need to encoded byte arrays with a BOM. Also, because I'm dealing with a Microsoft client/server, I'd like…
Jared Oberhaus
  • 14,547
  • 4
  • 56
  • 55
23
votes
5 answers

How to read utf16 text file to string in golang?

I can read the file to bytes array but when I convert it to string it treat the utf16 bytes as ascii How to convert it correctly? package main import ("fmt" "os" "bufio" ) func main(){ // read whole the file f, err := os.Open("test.txt") …
CL So
  • 3,647
  • 10
  • 51
  • 95
21
votes
3 answers

What Character Encoding is best for multinational companies

If you had a website that was to be translated into every language in the world and therefore had a database with all these translations what character encoding would be best? UTF-128? If so do all browsers understand the chosen encoding? Is…
HGPB
  • 4,346
  • 8
  • 50
  • 86
21
votes
2 answers

Python - Decode UTF-16 file with BOM

I have a UTF-16 LE file with BOM. I'd like to flip this file in to UTF-8 without BOM so I can parse it using Python. The usual code that I use didn't do the trick, it returned unknown characters instead of the actual file contents. f =…
Dustin
  • 6,207
  • 19
  • 61
  • 93
21
votes
3 answers

Using JNA to get/set application identifier

Following up on my previous question concerning the Windows 7 taskbar, I would like to diagnose why Windows isn't acknowledging that my application is independent of javaw.exe. I presently have the following JNA code to obtain the…
Paul Lammertsma
  • 37,593
  • 16
  • 136
  • 187
20
votes
5 answers

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS…
Dave
  • 885
  • 2
  • 10
  • 20
20
votes
6 answers

UnicodeDecodeError when performing os.walk

I am getting the error: 'ascii' codec can't decode byte 0x8b in position 14: ordinal not in range(128) when trying to do os.walk. The error occurs because some of the files in a directory have the 0x8b (non-utf8) character in them. The files come…
Scott
  • 1,333
  • 1
  • 14
  • 19
19
votes
3 answers

What should I use? UTF8 or UTF16?

I have to distribute my app internationally. Let's say I have a control (like a memo) where the user enters some text. The user can be Japanese, Russian, Canadian, etc. I want to save the string to disk as TXT file for later use. I will use MY OWN…
Gabriel
  • 20,797
  • 27
  • 159
  • 293
19
votes
5 answers

How to reduce memory footprint on .NET string intensive applications?

I have an application that have ~1,000,000 strings in memory for performance reasons. My application consumes ~200 MB RAM. I want to reduce the amount of memory consumed by the strings. I know .NET represents strings in UTF-16 encoding (2 byte per…
DxCK
  • 4,402
  • 7
  • 50
  • 89
1 2
3
79 80