Questions tagged [utf-16]

UTF-16 is a character encoding that represents Unicode code points using either 2 or 4 bytes per character.

UTF-16 is a character encoding that describes code points in byte sequences of either two or four bytes. It is therefore a variable-width character encoding.

The algorithm for encoding code points as UTF-16 is described in RFC 2781.

There are three flavors of UTF-16, little-endian, big-endian and with BOM (see ).

Related tags

1193 questions
8
votes
2 answers

How to output Byte Order Mark when writing to TextWriter?

i am writing text to a TextWriter. i want the UTF-16 Byte Order Mark (BOM) to appear in the output: public void ProcessRequest(HttpContext context) { context.Response.ContentEncoding = new UnicodeEncoding(true, true); …
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
8
votes
3 answers

Correctly reading a utf-16 text file into a string without external libraries?

I've been using StackOverflow since the beginning, and have on occasion been tempted to post questions, but I've always either figured them out myself or found answers posted eventually... until now. This feels like it should be fairly simple, but…
neminem
  • 2,658
  • 5
  • 27
  • 36
7
votes
4 answers

Read Unicode files C++

I have a simple question to ask. I have a UTF 16 text file to read wich starts with FFFE. What are the C++ tools to deal with this kind of file? I just want to read it, filter some lines, and display the result. It looks simple, but I just have…
Andres
  • 3,324
  • 6
  • 27
  • 32
7
votes
6 answers

What could go wrong in switching HTML encoding from UTF-8 to UTF-16?

What are the implications of a change from UTF-8 to UTF-16 for HTML encoding? I would like to know your thoughts on the issue. Are there things I need to think of before making such a change? Note: Interested due to enormous amounts of japanese…
Newbie
  • 7,031
  • 9
  • 60
  • 85
7
votes
3 answers

How do you get Matlab to write the BOM (byte order markers) for UTF-16 text files?

I am creating UTF16 text files with Matlab, which I am later reading in using Java. In Matlab, I open a file called fileName and write to it as follows: fid = fopen(fileName, 'w','n','UTF16-LE'); fprintf(fid,"Some stuff."); In Java, I can read the…
Richard Povinelli
  • 1,419
  • 1
  • 14
  • 28
7
votes
3 answers

Storing UTF-16/Unicode data in SQL Server

According to this, SQL Server 2K5 uses UCS-2 internally. It can store UTF-16 data in UCS-2 (with appropriate data types, nchar etc), however if there is a supplementary character this is stored as 2 UCS-2 characters. This brings the obvious issues…
David Cameron
7
votes
1 answer

Emojis to/from codepoints in Javascript

In a hybrid Android/Cordova game that I am creating I let users provide an identifier in the form of an Emoji + an alphanumeric - i.e. 0..9,A..Z,a..z - name. For example ‍️Stackoverflow Server-side the user identifiers are stored with the Emoji and…
DroidOS
  • 8,530
  • 16
  • 99
  • 171
7
votes
8 answers

Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?

Please clarify for me, how does UTF16 work? I am a little confused, considering these points: There is a static type in C++, WCHAR, which is 2 bytes long. (always 2 bytes long obvisouly) (UPDATE: as shown by the answers, this assumption was…
Cray
  • 2,396
  • 19
  • 29
7
votes
1 answer

UTF-16 codepoint counting in python

I'm getting some data from an API (telegram-bot) I'm using. I'm using the python-telegram-bot library which interacts with the Telegram Bot api. The data is returned in the UTF-8 encoding in JSON format. Example (snippet): {'message': {'text':…
jsmnbom
  • 138
  • 8
7
votes
1 answer

Unable to set UTF-16 as locale

I'm unable to set UTF-16, or any form thereof, as locale on my Linux box. The sample code for this: #include #include #include using namespace std; int main() { char *ret = std::setlocale(LC_ALL,…
Maddy
  • 1,319
  • 3
  • 22
  • 37
7
votes
1 answer

Emoji doesn't show in NSAttributedString when typed using iOS keyboard, does when typed on Android

I'm making a messenging application and when I send an emoji from the Android side, it shows fine on the iOS side, yet the iOS side cannot (it seems) display emojis from iOS's own keyboard! The label in which I am showing the emoji uses attributed…
Hamzah Malik
  • 2,540
  • 3
  • 28
  • 46
7
votes
3 answers

How can I decode UTF-16 data in Perl when I don't know the byte order?

If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use…
Geo
  • 93,257
  • 117
  • 344
  • 520
7
votes
2 answers

MD5 value mismatch between SQL server and PostgreSQL

In order to write some code to do consistency check of data stored in both SQL server and PostgreSQL, I plan to calculate the MD5 on table data for both the databases, and verify if they are equal. This works fine as long as data is plain text (…
LKB
  • 127
  • 2
  • 8
7
votes
2 answers

Java, JavaCC: How to parse characters outside the BMP?

I am referring to the XML 1.1 spec. Look at the definition of NameStartChar: NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |…
java.is.for.desktop
  • 10,748
  • 12
  • 69
  • 103
7
votes
2 answers

UTF-8 to UTF-16LE Javascript

I need to convert an utf-8 string to utf-16LE in javascript like the iconv() php function. Ie: iconv("UTF-8", "UTF-16LE", $string); The output should be like this: 49 00 6e 00 64 00 65 00 78 00 I found this func to decode UTF-16LE and it's…
Keveun
  • 153
  • 1
  • 7