Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

2 answers

Remove or match a Unicode Zero Width Space PHP

I have a text in Burmese language, UTF-8. I am using PHP to work with the text. At some point along the way, some ZWSPs have crept in and I would like to remove them. I have tried two different ways of removing the characters, and neither seems…

php replace unicode

asked Mar 24 '14 at 02:43

Jimmy Long

votes

3 answers

Unicode vs Multi-byte

I'm really confused by this unicode vs multi-byte thing. Say I'm compiling my program in Unicode (but ultimately, I want a solution that is independent of the character set used). 1) Will all 'char' be interpreted as wide characters? 2) If I have a…

c unicode visual-c++ multibyte

asked Feb 09 '10 at 03:17

Rayne

votes

1 answer

Display width of unicode strings in Python

How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format()? Motivating example: Printing a table of strings to the console. Some of the strings contain…

python string unicode width python-unicode

asked Mar 06 '14 at 13:05

Christian Aichinger

6,989
4
40
60

votes

2 answers

Unicode character usage statistics

I am looking for some statistical data on the usage of Unicode characters in textual documents (with any markup). Googling brought no results. Background: I am currently developing a finite state machine-based text processing tool. Statistical data…

unicode

asked Mar 04 '14 at 22:35

lexicore

42,748
17
132
221

votes

9 answers

replace emoji unicode symbol using regexp in javascript

As you all know emoji symbols are coded up to 3 or 4 bytes, so it may occupy 2 symbols in my string. For example 'wew'.length = 7 I want to find those symbols in my text and replace them to the value that is dependent from its code. Reading SO, I…

javascript regex unicode emoji

asked Feb 25 '14 at 06:21

Fedor Skrynnikov

5,521
4
28
32

votes

7 answers

Ruby 1.9 doesn't support Unicode normalization yet

I'm trying to port over some of my old rails apps to Ruby 1.9 and I keep getting warnings about how "Ruby 1.9 doesn't support Unicode normalization yet." I've tracked it down to this function, but I'm getting about 20 warning messages per…

ruby-on-rails ruby unicode

asked Jan 25 '10 at 20:08

go minimal

1,693
5
25
42

votes

4 answers

An equivalent to string.ascii_letters for unicode strings in python 2.x?

In the "string" module of the standard library, string.ascii_letters ## Same as string.ascii_lowercase + string.ascii_uppercase is 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' Is there a similar constant which would include everything…

python unicode python-2.x

asked Jan 24 '10 at 09:26

emm

votes

1 answer

Select unicode character u2028 in mysql 5.1

I am trying to select unicdode character /u2028 in MySQL 5.1. MySQL 5.1 does support utf8 and ucs2. In newer versions of MySQL i could select the char just be using utf16 or utf32 collation: SELECT char(0x2028 using utf16); SELECT char(0x00002028…

mysql unicode utf-8

asked Jan 20 '14 at 12:09

jelhan

6,149
1
19
35

votes

1 answer

Pass a list of string from Django to Javascript

My Django objects have an attribute "City". I'm trying to get a list of cities and catch it in the template with Jquery (to use in a chart on the X axis). My problem is that I can't get rid of the unicode and quote for a list. (I manage to do it for…

javascript jquery django unicode

asked Jan 15 '14 at 23:02

xavier carbonel

votes

4 answers

HTML unicode arrow works on Safari desktop, but not Safari for iOS

I'm using the ❯ arrow on a page, and it renders properly on Chrome, Firefox and Safari on OS X, however in Safari on iOS (iPhone), the arrows render as empty boxes (you know, the "unable to render" box). Any ideas on why this is happening and what I…

html ios css unicode safari

asked Dec 27 '13 at 17:00

james.spinella

votes

2 answers

How to input Unicode character in Rails console?

While using Rails console, when I input ä, \U+FFC3\U+FFA4 appears. Of course I can input Unicode characters outside of rails. I'm using Ruby 2.0.0p247, Rails 4.0.0 in Max OS X 10.7.5. How can I input Unicode characters in Rails console?

ruby-on-rails unicode console

asked Nov 08 '13 at 11:02

ironsand

14,329
17
83
176

votes

3 answers

Javascript: Non-unicode char code to unicode character?

I'm having a character code issue with a barcode scanner used to input characters to a web interface. If a barcode has a symbol such as - (a dash/hyphen/minus) it gives me character code 189 which is correct in many character sets. Indeed, if I have…

javascript unicode character-encoding keypress keycode

asked Oct 09 '13 at 16:49

Scott F

votes

4 answers

How to find and count emoticons in a string using python?

This topic has been addressed for text based emoticons at link1, link2, link3. However, I would like to do something slightly different than matching simple emoticons. I'm sorting through tweets that contain the emoticons' icons. The following…

python regex string unicode

asked Oct 03 '13 at 00:57

blehman

1,870
7
28
39

votes

2 answers

How to display unicode in SVG?

An information stored in SVG format in the database. If the data contains text it will be displayed as Unicode. It is necessary to correctly display the SVG files in the browser.

unicode svg

asked Oct 02 '13 at 10:36

adelak

votes

5 answers

Unicode filenames on Windows with Python & subprocess.Popen()

Why does the following occur: >>> u'\u0308'.encode('mbcs') #UMLAUT '\xa8' >>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA '?' >>> I have a Python application accepting filenames from the operating system. It works for some…

python windows unicode

asked Dec 15 '09 at 20:45

Norman

Prev 1 2 3

…

99 100 Next