Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions
13
votes
1 answer

cout<< "привет"; or wcout<< L"привет";

Why cout<< "привет"; works well while wcout<< L"привет"; does not? (in Qt Creator for linux)
Minimus Heximus
  • 2,683
  • 3
  • 25
  • 50
13
votes
1 answer

serializing to JSON that would retain hebrew charcters

I have the following use case: from data I produce a json with data, part of it hebrew words. for example: import json j = {} city =u'חיפה' #native unicode j['results']= [] j['results'].append({'city':city}) #Also tried to city.encode('utf-8') and…
alonisser
  • 11,542
  • 21
  • 85
  • 139
13
votes
3 answers

MySQL - Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation 'UNION'

How do I fix that error once and for all? I just want to be able to do unions in MySQL. (I'm looking for a shortcut, like an option to make MySQL ignore that issue or take it's best guess, not looking to change collations on 100s of tables ... at…
Greg
  • 45,306
  • 89
  • 231
  • 297
13
votes
7 answers

Converting a \u escaped Unicode string to ASCII

After reading all about iconv and Encoding, I am still confused. I am scraping the source of a web page I have a string that looks like this: 'pretty\u003D\u003Ebig' (displayed in the R console as 'pretty\\\u003D\\\u003Ebig'). I want to convert this…
seancarmody
  • 6,182
  • 2
  • 34
  • 31
13
votes
4 answers

Print Unicode characters PHP

I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response. For instance, when I print all games with the name like…
Cameron Tinker
  • 9,634
  • 10
  • 46
  • 85
13
votes
4 answers

Using unicode characters bigger than 2 bytes with .Net

I'm using this code to generate U+10FFFC var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC}); I know it's for private-use and such, but it does display a single character as I'd expect when displaying it. The problems come when…
Earlz
  • 62,085
  • 98
  • 303
  • 499
13
votes
4 answers

Delphi Unicode String Length in Bytes

I'm working on porting some Delphi 7 code to XE4, so, unicode is the subject here. I have a method where a string gets written to a TMemoryStream, so according to this embarcadero article, I should multiply the length of the string (in characters)…
Jessica Brown
  • 8,222
  • 7
  • 46
  • 82
13
votes
3 answers

How to handle Unicode (non-ASCII) characters in Python?

I'm programming in Python and I'm obtaining information from a web page through the urllib2 library. The problem is that that page can provide me with non-ASCII characters, like 'ñ', 'á', etc. In the very moment urllib2 gets this character, it…
Roman
13
votes
2 answers

When I type non-ASCII characters using a Windows keyboard I get "?"

When I type non-ASCII characters using a Windows keyboard (in the language bar), I get question marks ? where the non-ASCII characters should go. Copy-and-paste works fine and the Unicode characters are displayed in the Text widget. I am using the…
Biagio Arobba
  • 1,075
  • 11
  • 27
13
votes
2 answers

How to decode unicode HTML by JavaScript?

How to use JavaScript to decode from: \u003cb\u003estring\u003c/b\u003e to string (I searched in internet, there are some site with same question, such as: Javascript html decoding or How to decode HTML entities but it dont have same encode…
NoName
  • 7,940
  • 13
  • 56
  • 108
13
votes
2 answers

python 2.7 string.join() with unicode

I have bunch of byte strings (str, not unicode, in python 2.7) containing unicode data (in utf-8 encoding). I am trying to join them( by "".join(utf8_strings) or u"".join(utf8_strings)) which throws UnicodeDecodeError: 'ascii' codec can't decode…
thkang
  • 11,215
  • 14
  • 67
  • 83
13
votes
2 answers

UTF-8: how many bytes are used by languages to represent a visible character?

Does there exist a table or something similar which shows how many bytes different languages need on average to represent a visible character (glyph) when the encoding is utf8?
sid_com
  • 24,137
  • 26
  • 96
  • 187
13
votes
1 answer

std::string, wstring, u16/32string clarification

My current understanding of the difference between std::string and std::wstring is simply the buffer's type; namely, char vs wchar_t, respectively. I've also read that most (if not all) linux distros use char for any and all strings, both ASCII as…
Qix - MONICA WAS MISTREATED
  • 14,451
  • 16
  • 82
  • 145
13
votes
9 answers

Detect if a user has typed an emoji character in UITextView

I have a UITextView and I need to detect if a user enters an emoji character. I would think that just checking the unicode value of the newest character would suffice but with the new emoji 2s, some characters are scattered all throughout the…
Albert Renshaw
  • 17,282
  • 18
  • 107
  • 195
13
votes
1 answer

Seeking istreambuf_iterator clarifications, reading a complete text file of Unicode characters

In the book “Effective STL” by Scott Meyers, there is a nice example of reading an entire text file into a std::string object: std::string sData; /*** Open the file for reading, binary mode ***/ std::ifstream ifFile (“MyFile.txt”,…
Chris Wiesner
  • 131
  • 1
  • 3