Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

3 answers

comfortable way to use unicode characters in a ggplot graph

Is there a good practice to insert unicode characters in a ggplot title and also save it as pdf? I am struggling with expression, paste and sprintf to get a nice title... So, what works is ggtitle(expression(paste('5', mu, 'g'))) This will print an…

r unicode ggplot2

asked Nov 05 '14 at 08:40

drmariod

11,106
16
64
110

votes

2 answers

Can't get Czech characters while generating a PDF

I have a problem when adding characters such as "Č" or "Ć" while generating a PDF. I'm mostly using paragraphs for inserting some static text into my PDF report. Here is some sample code I used: var document = new…

c# asp.net pdf unicode itext

asked Oct 29 '14 at 13:36

perkes456

1,163
4
25
49

votes

1 answer

Evaluate UTF-8 literal escape sequences in a string in Python3

I have a string of the form: s = '\\xe2\\x99\\xac' I would like to convert this to the character ♬ by evaluating the escape sequence. However, everything I've tried either results in an error or prints out garbage. How can I force Python to convert…

python string python-3.x unicode utf-8

asked Oct 11 '14 at 05:04

Altay_H

votes

1 answer

Beautiful Soup Unicode encode error

I am trying the following code with a particular HTML file from BeautifulSoup import BeautifulSoup import re import codecs import sys f = open('test1.html') html = f.read() soup = BeautifulSoup(html) body = soup.body.contents para =…

python unicode beautifulsoup

asked Apr 13 '10 at 04:58

Rohit Banga

18,458
31
113
191

votes

4 answers

HTML unicode ☰ not detected in mobile web application menu in android chrome browser

i have a issue in my website menu in android mobile chrome browser that is not able to show unicode ☰ . but if i am check my web application in iPhone or other android browser it is rendering or working properly. I am used this icon in this…

android css google-chrome unicode

asked Sep 19 '14 at 08:52

Mohammed Javed

votes

4 answers

opencv imread() on Windows for non-ASCII file names

We have an OpenCV problem of opening (and writing) file paths that contain non-ASCII characters on Windows. Affected functions are: cv::imread(), cv::imwrite(), ... As far as I saw in the OpenCV source code, it uses fopen even on Windows (instead of…

c++ opencv winapi unicode imread

asked Jul 15 '14 at 23:07

Vyacheslav

1,186
2
15
29

votes

1 answer

How to write 3 bytes unicode literal in Java?

I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I I tried with '\u10428' and it doesn't compile.

java unicode utf-16 utf-32 unicode-literals

asked Jul 08 '14 at 13:35

kawty

1,656
15
22

votes

7 answers

Why can't I use accented characters next to a word boundary?

I'm trying to make a dynamic regex that matches a person's name. It works without problems on most names, until I ran into accented characters at the end of the name. Example: Some Fancy Namé The regex I've used so far is: /\b(Fancy…

javascript regex unicode replace diacritics

asked Mar 15 '10 at 19:15

Rexxars

1,167
8
10

votes

5 answers

Programmatically determine number of strokes in a Chinese character?

Does Unicode store stroke count information about Chinese, Japanese, or other stroke-based characters?

unicode character-encoding cjk

asked Mar 07 '10 at 22:53

xkdkxdxc

votes

2 answers

How to create SSL certificate with Unicode characters in the Organization name (or other fields)?

I've created a self-signed SSL certificate and have no trouble using it, but the browser (Firefox, Chrome/IE) shows garbled characters in the Organization's name (anything above ASCII has 2 characters). I created the certificate in a Debian running…

ssl unicode certificate

asked May 11 '14 at 23:06

vesperto

votes

4 answers

Read/Write file with unicode file name with plain C++/Boost

I want to read / write a file with a unicode file name using boost filesystem, boost locale on Windows (mingw) (should be platform independent at the end). This is my code: #include #define BOOST_NO_CXX11_SCOPED_ENUMS #include…

c++ boost unicode boost-filesystem boost-locale

asked Apr 30 '14 at 16:53

Mike M

2,263
3
17
31

votes

2 answers

Python: solving unicode hell with unidecode

I have been working on ways to flatten text into ascii. So ā -> a and ñ -> n, etc. unidecode has been fantastic for this. # -*- coding: utf-8 -*- from unidecode import unidecode print(unidecode(u"ā, ī, ū, ś, ñ")) print(unidecode(u"Estado de São…

python unicode

asked Mar 20 '14 at 17:12

e h

8,435
7
40
58

votes

1 answer

Error writing a file with file.write in Python. UnicodeEncodeError

I have never dealt with encoding and decoding strings, so I am quite the newbie on this front. I am receiving a UnicodeEncodeError when I try to write the contents I read from another file to a temporary file using file.write in Python. I get the…

python-2.7 unicode decode encode fwrite

asked Mar 13 '14 at 22:30

user2643864

votes

3 answers

Unicode problems in C++ but not C

I'm trying to write unicode strings to the screen in C++ on Windows. I changed my console font to Lucida Console and I set the output to CP_UTF8 aka 65001. I run the following code: #include //notice this header file.. #include…

c++ c unicode utf-8

asked Jan 26 '14 at 23:32

Brandon

22,723
11
93
186

votes

3 answers

"surrogateescape" cannot escape certain characters

Regarding reading and writing text files in Python, one of the main Python contributors mentions this regarding the surrogateescape Unicode Error Handler: [surrogateescape] handles decoding errors by squirreling the data away in a little used part…

python unicode encoding utf-8

asked Jan 14 '14 at 14:30

dotancohen

30,064
36
138
197

Prev 1 2 3

…

100 Next