Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions
270
votes
5 answers

Why does this code, written backwards, print "Hello World!"

Here is some code that I found on the Internet: class M‮{public static void main(String[]a‭){System.out.print(new char[] {'H','e','l','l','o',' ','W','o','r','l','d','!'});}} This code prints Hello World! onto the screen; you can see it run…
dumbPotato21
  • 5,669
  • 5
  • 21
  • 34
269
votes
19 answers

How to convert wstring into string?

The question is how to convert wstring to string? I have next example : #include #include int main() { std::wstring ws = L"Hello"; std::string s( ws.begin(), ws.end() ); //std::cout <<"std::string = …
BЈовић
  • 62,405
  • 41
  • 173
  • 273
268
votes
19 answers

How do you echo a 4-digit Unicode character in Bash?

I'd like to add the Unicode skull and crossbones to my shell prompt (specifically the 'SKULL AND CROSSBONES' (U+2620)), but I can't figure out the magic incantation to make echo spit it, or any other, 4-digit Unicode character. Two-digit one's are…
masukomi
  • 10,313
  • 10
  • 40
  • 49
266
votes
6 answers

How can I change a file's encoding with vim?

I'm used to using vim to modify a file's line endings: $ file file file: ASCII text, with CRLF line terminators $ vim file :set ff=mac :wq $ file file file: ASCII text, with CR line terminators Is it possible to use a similar process to change a…
skiphoppy
  • 97,646
  • 72
  • 174
  • 218
258
votes
11 answers

How can I use Unicode-aware regular expressions in JavaScript?

There should be something akin to \w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.
Amit
258
votes
16 answers

How to check if a string in Python is in ASCII?

I want to I check whether a string is in ASCII or not. I am aware of ord(), however when I try ord('é'), I have TypeError: ord() expected a character, but string of length 2 found. I understood it is caused by the way I built Python (as explained in…
Nico
  • 2,599
  • 2
  • 15
  • 5
250
votes
8 answers

Writing Unicode text to a text file?

I'm pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a Wordpress page). It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source? Currently…
simon
  • 5,987
  • 13
  • 31
  • 28
239
votes
5 answers

What is the difference between _tmain() and main() in C++?

If I run my C++ application with the following main() method everything is OK: int main(int argc, char *argv[]) { cout << "There are " << argc << " arguments:" << endl; // Loop through each argument and print its number and value for (int…
joshcomley
  • 28,099
  • 24
  • 107
  • 147
239
votes
13 answers

How to convert a string to utf-8 in Python

I have a browser which sends utf-8 characters to my Python server, but when I retrieve it from the query string, the encoding that Python returns is ASCII. How can I convert the plain string to utf-8? NOTE: The string passed from the web is already…
Bin Chen
  • 61,507
  • 53
  • 142
  • 183
237
votes
9 answers

What's the difference between Unicode and UTF-8?

Consider: Is it true that unicode=utf16? Many are saying Unicode is a standard, not an encoding, but most editors support save as Unicode encoding actually.
ollydbg
  • 3,475
  • 7
  • 28
  • 29
231
votes
6 answers

Python __str__ versus __unicode__

Is there a python convention for when you should implement __str__() versus __unicode__(). I've seen classes override __unicode__() more frequently than __str__() but it doesn't appear to be consistent. Are there specific rules when it is better…
Cory
  • 22,772
  • 19
  • 94
  • 91
223
votes
5 answers

Is there Unicode glyph Symbol to represent "Search"

Unicode has a million icon-like glyphs, but they're very hard to search. Is there a Unicode glyph that looks like a "Binocular" or "magnifying glass"? Or is there a symbol that's used to mean "Search", which is in Unicode?
Prasad Jadhav
  • 5,090
  • 16
  • 62
  • 80
211
votes
7 answers

What are "connecting characters" in Java identifiers?

I am reading for SCJP and I have a question regarding this line: Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore ( _ ). Identifiers cannot start with a number! It states that a…
LuckyLuke
  • 47,771
  • 85
  • 270
  • 434
210
votes
10 answers

(grep) Regex to match non-ASCII characters?

On Linux, I have a directory with lots of files. Some of them have non-ASCII characters, but they are all valid UTF-8. One program has a bug that prevents it working with non-ASCII filenames, and I have to find out how many are affected. I was going…
Amandasaurus
  • 58,203
  • 71
  • 188
  • 248
202
votes
8 answers

Unicode character in PHP string

This question looks embarrassingly simple, but I haven't been able to find an answer. What is the PHP equivalent to the following C# line of code? string str = "\u1000"; This sample creates a string with a single Unicode character whose "Unicode…
Telaclavo
  • 2,529
  • 2
  • 17
  • 15