Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

3 answers

Will everything in the standard library treat strings as unicode in Python 3.0?

I'm a little confused about how the standard library will behave now that Python (from 3.0) is unicode-based. Will modules such as CGI and urllib use unicode strings or will they use the new 'bytes' type and just provide encoded data?

python unicode string cgi python-3.x

asked Sep 18 '08 at 09:29

hacama

votes

5 answers

Remove multiple BOMs from a file

I am using a Javascript file that is a concatenation of other JavaScript files. Unfortunately, the person who concatenated these JavaScript files together did not use the proper encoding when reading the file, and allowed a BOM for every single…

unicode byte-order-mark

asked Feb 01 '12 at 17:54

Macy Abbey

3,877
1
20
30

votes

1 answer

Unicode in button title in XCode

I'm trying to display the greek letter pi (unicode \u03C0) on a button as the title. When I try to set the title using drag'n'drop graphical editor, the word "\u03C0" shows. Is there some way to set unicode text in the graphical editor, or do I need…

objective-c xcode button unicode title

asked Jan 18 '12 at 20:52

user1157134

votes

2 answers

Is it safe to assume users can see unicode characters U+2716 and U+2714 in CSS content?

I'm wanting to use the characters ✖ (U+2716) and ✔ (U+2714) in my CSS for form validation purposes. Basically, if a field is valid/invalid, I use the after pseudo class to insert the corresponding symbol after the field. For example: .field:after { …

html css unicode

asked Jan 14 '12 at 22:53

Philip Walton

29,693
16
60
84

votes

4 answers

rules for slugs and unicode

After researching a bit how the different way people slugify titles, I've noticed that it's often missing how to deal with non english titles. url encoding is very restrictive. See http://www.blooberry.com/indexdot/html/topics/urlencoding.htm So,…

python google-app-engine url unicode friendly-url

asked May 04 '09 at 15:04

bustrofedon

votes

1 answer

I lose “unicodeness” when qDebug()ing after instancing a QApplication

I am losing the capability of printing unicode characters right after instancing a QApplication object. From the following code and having included all the needed libraries: int main(int argc, char** argv) { qDebug() << "aeiou áéíóú"; …

qt unicode

asked Oct 05 '11 at 20:36

user1598585

votes

1 answer

python subprocess and unicode execv() arg 2 must contain only strings

I have a django site where I need to call a script using subprocess. The subprocess call works when I'm using ascii characters but when I try to issue arguments that are utf-8 encoded, I get an error: execv() arg 2 must contain only strings. The…

python unicode subprocess

asked Sep 30 '11 at 15:31

deecodameeko

votes

4 answers

Python efficient obfuscation of string

I need to obfuscate lines of Unicode text to slow down those who may want to extract them. Ideally this would be done with a built in Python module or a small add-on library; the string length will be the same or less than the original; and the…

python string unicode

asked Sep 20 '11 at 17:11

Tim

votes

2 answers

Detecting IME input before enter pressed in Javascript

I'm not even sure if this is possible, so apologies if it's a stupid question. I've set up an keyup callback through jQuery to run a function when a user types in an input box. It works fine for English. However when inputting text in…

javascript unicode localization dom-events ime

asked Sep 06 '11 at 08:20

benui

6,440
5
34
49

votes

5 answers

Can a PHP file name (or a dir in its full path) have UTF-8 characters?

I would like to access a PHP file whose name has UTF-8 characters in it. The file does not have a BOM in it. It just contains an echo statement that displays a few unicode characters. Accessing the PHP page from the browser (FireFox 3.0.8, IE7)…

php apache unicode utf-8 url-rewriting

asked Apr 02 '09 at 01:56

Raleigh

votes

4 answers

Getting python to print in UTF8 on Windows XP with the console

I would like to configure my console on Windows XP to support UTF8 and to have python detect that and work with it. So far, my attempts: C:\Documents and Settings\Philippe>C:\Python25\python.exe Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC…

python windows unicode utf-8 python-2.x

asked Aug 10 '11 at 16:34

Philippe F

11,776
5
29
30

votes

1 answer

StreamReader is unable to correctly read extended character set (UTF8)

I am having an issue where I am unable to read a file that contains foreign characters. The file, I have been told, is encoded in UTF-8 format. Here is the core of my code: using (FileStream fileStream = fileInfo.OpenRead()) { using…

c# unicode streamreader

asked Jul 11 '11 at 23:50

PolandSpring

2,664
7
26
35

votes

3 answers

Delphi2010: Writing code to assign Caption containing Unicode literal values or load unicode symbols from text file?

How to make a Unicode program in Delphi 2010? I have English Windows and "Current language for non-Unicode programs" is English too. Static controls look good but if I try to change them (Label.Caption := 'unicode value' or…

delphi text unicode project unicode-literals

asked Jun 07 '11 at 19:30

Michael

votes

6 answers

How to print tuples of unicode strings in original language (not u'foo' form)

I have a list of tuples of unicode objects: >>> t = [('亀',), ('犬',)] Printing this out, I get: >>> print t [('\xe4\xba\x80',), ('\xe7\x8a\xac',)] which I guess is a list of the utf-8 byte-code representation of those strings? but what I want to…

python unicode

asked Mar 07 '09 at 04:44

Daniel H

9,895
3
19
11

votes

3 answers

Parse a non-ascii (unicode) number-string as integer in .NET

I have a string containing a number in a non-ascii format e.g. unicode BENGALI DIGIT ONE (U+09E7) : "১" How do I parse this as an integer in .NET? Note: I've tried using int.Parse() specifying a bengali culture format with "bn-BD" as the…

.net unicode

asked May 26 '11 at 15:41

James McCormack

9,217
3
47
57

Prev 1 2 3

…

99 100 Next