Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions
200
votes
12 answers

Convert Unicode to ASCII without errors in Python

My code just scrapes a web page, then converts it to Unicode. html = urllib.urlopen(link).read() html.encode("utf8","ignore") self.response.out.write(html) But I get a UnicodeDecodeError: Traceback (most recent call last): File…
themirror
  • 9,963
  • 7
  • 46
  • 79
195
votes
6 answers

Insert Unicode character into JavaScript

I need to insert an Omega (Ω) onto my html page. I am using its HTML escaped code to do that, so I can write Ω and get Ω. That's all fine and well when I put it into a HTML element; however, when I try to put it into my JS, e.g. var Omega =…
Bluefire
  • 13,519
  • 24
  • 74
  • 118
191
votes
5 answers

How well is Unicode supported in C++11?

I've read and heard that C++11 supports Unicode. A few questions on that: How well does the C++ standard library support Unicode? Does std::string do what it should? How do I use it? Where are potential problems?
Ralph Tandetzky
  • 22,780
  • 11
  • 73
  • 120
187
votes
5 answers

Difference between BYTE and CHAR in column datatypes

In Oracle, what is the difference between : CREATE TABLE CLIENT ( NAME VARCHAR2(11 BYTE), ID_CLIENT NUMBER ) and CREATE TABLE CLIENT ( NAME VARCHAR2(11 CHAR), -- or even VARCHAR2(11) ID_CLIENT NUMBER )
Guido
  • 46,642
  • 28
  • 120
  • 174
187
votes
7 answers

What is a "surrogate pair" in Java?

I was reading the documentation for StringBuffer, in particular the reverse() method. That documentation mentions something about surrogate pairs. What is a surrogate pair in this context? And what are low and high surrogates?
Raymond
  • 2,004
  • 2
  • 13
  • 10
185
votes
7 answers

What is the difference between encode/decode?

I've never been sure that I understand the difference between str/unicode decode and encode. I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a…
ʞɔıu
  • 47,148
  • 35
  • 106
  • 149
185
votes
7 answers

NameError: global name 'unicode' is not defined - in Python 3

I am trying to use a Python package called bidi. In a module in this package (algorithm.py) there are some lines that give me error, although it is part of the package. Here are the lines: # utf-8 ? we need unicode if isinstance(unicode_or_str,…
TJ1
  • 7,578
  • 19
  • 76
  • 119
183
votes
3 answers

Difference between Char.IsDigit() and Char.IsNumber() in C#

What's the difference between Char.IsDigit() and Char.IsNumber() in C#?
Guy
  • 65,082
  • 97
  • 254
  • 325
181
votes
17 answers

String length in bytes in JavaScript

In my JavaScript code I need to compose a message to server in this format: CRLF CRLF Example: 3 foo The data may contain unicode characters. I need to send them as UTF-8. I'm looking for the most cross-browser way to…
Alexander Gladysh
  • 39,865
  • 32
  • 103
  • 160
180
votes
14 answers

How to prevent Unicode characters from rendering as emoji in HTML from JavaScript?

I'm finding Unicode for special characters from FileFormat.Info's search. Some characters are rendering as the classic black-and-white glyphs, such as ⚠ (warning sign, \u26A0 or ⚠). These are preferable, since I can apply CSS styles (such as…
anon
177
votes
2 answers

What is the _snowman param in Ruby on Rails 3 forms for?

In Ruby on Rails 3 (currently using Beta 4), I see that when using the form_tag or form_for helpers there is a hidden field named _snowman with the value of ☃ (Unicode \x9731) showing up. So, what is this for?
Matthew Savage
  • 3,794
  • 10
  • 43
  • 53
175
votes
6 answers

What would be the Unicode character for big bullet in the middle of the character?

I want something like 0x2022 8226 BULLET • But bigger. I can't even seem to find them at http://www.ssec.wisc.edu/~tomw/java/unicode.html What should I search for? Dots? bullets?
user4951
  • 32,206
  • 53
  • 172
  • 282
172
votes
15 answers

Python, Unicode, and the Windows console

When I try to print a string in a Windows console, sometimes I get an error that says UnicodeEncodeError: 'charmap' codec can't encode character ..... I assume this is because the Windows console cannot handle all Unicode characters. How can I work…
James Sulak
  • 31,389
  • 11
  • 53
  • 57
171
votes
9 answers

MySQL "incorrect string value" error when save unicode string in Django

I got strange error message when tried to save first_name, last_name to Django's auth_user model. Failed examples user = User.object.create_user(username, email, password) user.first_name = u'Rytis' user.last_name = u'Slatkevičius' user.save() >>>…
jack
  • 17,261
  • 37
  • 100
  • 125
170
votes
10 answers

Can I make git recognize a UTF-16 file as text?

I'm tracking a Virtual PC virtual machine file (*.vmc) in git, and after making a change git identified the file as binary and wouldn't diff it for me. I discovered that the file was encoded in UTF-16. Can git be taught to recognize that this file…
skiphoppy
  • 97,646
  • 72
  • 174
  • 218