Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
6
votes
2 answers

Convert � � to Emoji in HTML using PHP

We have a bunch of surrogate pair (or 2-byte utf8?) characters such as �� which is the prayer hands emojis stored as UTF8 as 2 characters. When rendered in a browser this string renders as two ?? example: I need to convert those to…
Tyler F
  • 101
  • 1
  • 7
6
votes
1 answer

Efficient binary-to-string formatting (like base64, but for UTF8/UTF16)?

I have many bunches of binary data, ranging from 16 to 4096 bytes, which need to be stored to a database and which should be easily comparable as a unit (e.g. two bunches of data batch only if the lengths match and all bytes match). Strings are…
supercat
  • 77,689
  • 9
  • 166
  • 211
6
votes
0 answers

How to print the utf-16 characters in c

int main() { char c = 0x41; printf("char is : %c\n",c); c = 0xe9; printf("char is : %c\n",c); unsigned int d = 0x164e; printf("char is : %c\n",d); return 0; } What I want to print out are: I use Ubuntu 64-bit…
Patrick
  • 293
  • 1
  • 5
  • 14
6
votes
1 answer

Any way to convert a regular string in ActionScript 3 to a ByteArray of Latin-1 Character Codes?

I am having no problem converting a string to a byteArray of UTF-16 encoded characters, but the application I am trying to communicate with (written in Erlang) only understands Latin-1 encoding. Is there any way of producing a byteArray full of…
Mike Keen
  • 171
  • 1
  • 4
  • 12
6
votes
1 answer

How to read utf-16 file into utf-8 std::string line by line

I'm working with code that expects utf8-encoded std::string variables. I want to be able to handle a user-supplied file that potentially has utf-16 encoding (I don't know the encoding at design time, but eventually want to be able to deal with…
Hoobajoob
  • 2,748
  • 3
  • 28
  • 33
6
votes
5 answers

How to convert Unicode string into a utf-8 or utf-16 string?

How to convert Unicode string into a utf-8 or utf-16 string? My VS2005 project is using Unicode char set, while sqlite in cpp provide int sqlite3_open( const char *filename, /* Database filename (UTF-8) */ sqlite3 **ppDb /* OUT:…
user25749
  • 4,825
  • 14
  • 61
  • 83
6
votes
2 answers

Does std::wstring support UTF-16 and UTF-32 on Windows?

I'm learning about Unicode and have a few questions that I'm hoping to get answered. 1) I've read that on Linux, a std::wstring is 4-bytes, while on Windows, it's 2-bytes. Does this mean that Linux internal support is UTF-32 while Windows it is…
Caroline Beltran
  • 888
  • 2
  • 9
  • 22
6
votes
3 answers

Difference between composite characters and surrogate pairs

In Unicode what is the difference between composite characters and surrogate pairs? To me they sound like similar things - two characters to represent one character. What differentiates these two concepts?
Sachin Kainth
  • 45,256
  • 81
  • 201
  • 304
6
votes
4 answers

How do I convert a string in UTF-16 to UTF-8 in C++

Consider: STDMETHODIMP CFileSystemAPI::setRRConfig( BSTR config_str, VARIANT* ret ) { mReportReaderFactory.reset( new sbis::report_reader::ReportReaderFactory() ); USES_CONVERSION; std::string configuration_str = W2A( config_str ); But in…
user3252635
  • 115
  • 1
  • 1
  • 4
6
votes
2 answers

Search or compare within a Grapheme Cluster in Korean

In my current implementation of a UISearchBarController I'm using [NSString compare:] inside the filterContentForSearchText:scope: delegate method to return relevant objects based on their name property to the results UITableView as you start…
Jessedc
  • 12,320
  • 3
  • 50
  • 63
6
votes
2 answers

Substring or characterAt method for UTF8 Strings with 2+ bytes in JAVA

I'm trying to find a substring method, or characterAt method that works on string containing UTF-8 encoded text in JAVA. Internally, JAVA works with UTF-16. This means that a String is composed of chars with a size of 2 bytes. A UTF-8 character can…
Wouter
  • 1,829
  • 3
  • 28
  • 34
6
votes
2 answers

Prevent XSLT transform from converting utf-8 XML into utf-16?

In Delphi XE2, I'm doing a xslt transform on a received XML file to remove all namespace information. Problem: It changes into This is the XML that I get back from…
Jan Doggen
  • 8,799
  • 13
  • 70
  • 144
6
votes
1 answer

Send an SMS message (UTF-16) with an unknown character replaced by a "replacement character" in Android

I have a problem with sending SMS messages. I created a string with characters like "\uFDE8" (it's 65000). When I convert it back, I get 65000. It looks OK. But when I send an SMS with this string and receive the message, I have this character…
hevy
  • 91
  • 1
  • 8
5
votes
1 answer

How to use Ruby to replace text in a VC++ resource file, when the encoding is all wacked out?

I have a plain managed VC++ project in a solution. It has a resource file, app.rc, that is used to store the assembly info (version, product, copyright, etc). If I open the file in my text editor, it says it's a Unicode (UTF-16 LE BOM). And Visual…
Anthony Mastrean
  • 21,850
  • 21
  • 110
  • 188
5
votes
1 answer

What advantage is there to using UTF-8 over UTF-16?

Possible Duplicate: UTF8, UTF16, and UTF32 I am always reading things saying to write my source code in UTF-8 and stay way from other encodings, but it also seems like UTF-16 is an improved version of UTF-8. What is the difference between them,…
Orcris
  • 3,135
  • 6
  • 24
  • 24