Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

1 answer

cout<< "привет"; or wcout<< L"привет";

Why cout<< "привет"; works well while wcout<< L"привет"; does not? (in Qt Creator for linux)

c++ linux unicode console-application

asked Sep 07 '13 at 16:57

Minimus Heximus

2,683
3
25
50

votes

1 answer

serializing to JSON that would retain hebrew charcters

I have the following use case: from data I produce a json with data, part of it hebrew words. for example: import json j = {} city =u'חיפה' #native unicode j['results']= [] j['results'].append({'city':city}) #Also tried to city.encode('utf-8') and…

python json python-2.7 unicode

asked Aug 28 '13 at 06:39

alonisser

11,542
21
85
139

votes

3 answers

MySQL - Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation 'UNION'

How do I fix that error once and for all? I just want to be able to do unions in MySQL. (I'm looking for a shortcut, like an option to make MySQL ignore that issue or take it's best guess, not looking to change collations on 100s of tables ... at…

sql mysql unicode union collation

asked Oct 08 '08 at 15:42

Greg

45,306
89
231
297

votes

7 answers

Converting a \u escaped Unicode string to ASCII

After reading all about iconv and Encoding, I am still confused. I am scraping the source of a web page I have a string that looks like this: 'pretty\u003D\u003Ebig' (displayed in the R console as 'pretty\\\u003D\\\u003Ebig'). I want to convert this…

r unicode text-processing iconv unicode-string

asked Jul 20 '13 at 11:39

seancarmody

6,182
2
34
31

votes

4 answers

Print Unicode characters PHP

I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response. For instance, when I print all games with the name like…

php unicode html-escape-characters

asked Jul 09 '13 at 03:29

Cameron Tinker

9,634
10
46
85

votes

4 answers

Using unicode characters bigger than 2 bytes with .Net

I'm using this code to generate U+10FFFC var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC}); I know it's for private-use and such, but it does display a single character as I'd expect when displaying it. The problems come when…

c# .net unicode char utf-16

asked May 29 '13 at 14:24

Earlz

62,085
98
303
499

votes

4 answers

Delphi Unicode String Length in Bytes

I'm working on porting some Delphi 7 code to XE4, so, unicode is the subject here. I have a method where a string gets written to a TMemoryStream, so according to this embarcadero article, I should multiply the length of the string (in characters)…

delphi unicode delphi-xe4

asked May 13 '13 at 19:50

Jessica Brown

8,222
7
46
82

votes

3 answers

How to handle Unicode (non-ASCII) characters in Python?

I'm programming in Python and I'm obtaining information from a web page through the urllib2 library. The problem is that that page can provide me with non-ASCII characters, like 'ñ', 'á', etc. In the very moment urllib2 gets this character, it…

python unicode character-encoding

asked Oct 29 '09 at 15:42

Roman

votes

2 answers

When I type non-ASCII characters using a Windows keyboard I get "?"

When I type non-ASCII characters using a Windows keyboard (in the language bar), I get question marks ? where the non-ASCII characters should go. Copy-and-paste works fine and the Unicode characters are displayed in the Text widget. I am using the…

python unicode tkinter keyboard

asked Apr 14 '13 at 22:06

Biagio Arobba

1,075
11
27

votes

2 answers

How to decode unicode HTML by JavaScript?

How to use JavaScript to decode from: \u003cb\u003estring\u003c/b\u003e to string (I searched in internet, there are some site with same question, such as: Javascript html decoding or How to decode HTML entities but it dont have same encode…

javascript unicode decode

asked Apr 10 '13 at 15:08

NoName

7,940
13
56
108

votes

2 answers

python 2.7 string.join() with unicode

I have bunch of byte strings (str, not unicode, in python 2.7) containing unicode data (in utf-8 encoding). I am trying to join them( by "".join(utf8_strings) or u"".join(utf8_strings)) which throws UnicodeDecodeError: 'ascii' codec can't decode…

python unicode

asked Feb 07 '13 at 18:50

thkang

11,215
14
67
83

votes

2 answers

UTF-8: how many bytes are used by languages to represent a visible character?

Does there exist a table or something similar which shows how many bytes different languages need on average to represent a visible character (glyph) when the encoding is utf8?

unicode utf-8 character byte glyph

asked Jan 23 '13 at 17:21

sid_com

24,137
26
96
187

votes

1 answer

std::string, wstring, u16/32string clarification

My current understanding of the difference between std::string and std::wstring is simply the buffer's type; namely, char vs wchar_t, respectively. I've also read that most (if not all) linux distros use char for any and all strings, both ASCII as…

c++ string unicode std

asked Jan 21 '13 at 11:59

Qix - MONICA WAS MISTREATED

14,451
16
82
145

votes

9 answers

Detect if a user has typed an emoji character in UITextView

I have a UITextView and I need to detect if a user enters an emoji character. I would think that just checking the unicode value of the newest character would suffice but with the new emoji 2s, some characters are scattered all throughout the…

ios objective-c unicode detect emoji

asked Jan 15 '13 at 00:10

Albert Renshaw

17,282
18
107
195

votes

1 answer

Seeking istreambuf_iterator clarifications, reading a complete text file of Unicode characters

In the book “Effective STL” by Scott Meyers, there is a nice example of reading an entire text file into a std::string object: std::string sData; /*** Open the file for reading, binary mode ***/ std::ifstream ifFile (“MyFile.txt”,…

c++ unicode wstring istream-iterator wifstream

asked Jan 05 '13 at 01:34

Chris Wiesner

Prev 1 2 3

…

99 100 Next