Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

4 answers

freebcp: "Unicode data is odd byte size for column. Should be even byte size"

This file works fine (UTF-8): $ cat ok.txt 291054 Ţawī Rifā This file causes an error (UTF-8): $ cat bad.txt 291054 Ţawī Rifā‘ Here's the message: $ freebcp 'DB.dbo.table' in bad.txt ... -c Starting copy... Msg 20050, Level 4 Attempt to convert…

sql-server unicode sql-server-2012 character-encoding freetds

asked Aug 16 '16 at 20:38

Neil McGuigan

46,580
12
123
152

votes

2 answers

How to convert Unicode Character to Int in Swift

A user asked the following question to one of my answers. I have a unicode character \u{0D85}. How do I get the Int value from it? I was going to refer them to another Stack Overflow Q&A but I couldn't find one. These refer to converting the…

swift unicode integer

asked Aug 04 '16 at 23:51

Suragch

484,302
314
1,365
1,393

votes

2 answers

Decoding if it's not unicode

I want my function to take an argument that could be an unicode object or a utf-8 encoded string. Inside my function, I want to convert the argument to unicode. I have something like this: def myfunction(text): if not isinstance(text, unicode): …

python unicode encoding utf-8

asked Oct 04 '10 at 17:47

Manuel Ceron

8,268
8
31
38

votes

1 answer

How can I truncate a string to have at most N characters?

The expected approach of String.truncate(usize) fails because it doesn't consider Unicode characters (which is baffling considering Rust treats strings as Unicode). let mut s = "ボルテックス".to_string(); s.truncate(4); thread '' panicked at 'assertion…

string unicode rust truncate

asked Jul 19 '16 at 14:31

Peter Uhnak

9,617
5
38
51

votes

1 answer

How to pad and align unicode strings with special characters in python?

Python makes it easy to pad and align ascii strings, like so: >>> print "%20s and stuff" % ("test") test and stuff >>> print "{:>20} and stuff".format("test") test and stuff But how can I properly pad and align…

python unicode string-formatting

asked May 19 '16 at 22:27

camomilk

votes

2 answers

Why doesn't Perl v5.22 find all the sentence boundaries?

This is fixed in Perl 5.22.1. I write about it in Perl v5.22 adds fancy Unicode word boundaries. Perl v5.22 added the Unicode assertions from TR #29. I've been playing with the sentence boundary assertion, but it only seems to find the start and…

regex perl unicode

asked Apr 25 '16 at 05:32

brian d foy

129,424
31
207
592

votes

1 answer

How to do Unicode escaping in YAML multiline string?

Is it possible to use Unicode character escaping (e.g. \u2009) in YAML multiline strings? this_escape_works: "foo\u2009bar" this_escape_doesnt: > foo\u2009bar

unicode yaml

asked Mar 02 '16 at 11:37

Sampo

4,308
6
35
51

votes

2 answers

How to decode a unicode string Python

What is the best way to decode an encoded string that looks like: u'u\xf1somestring' ? Background: I have a list that contains random values (strings and integers), I'm trying to convert every item in the list to a string then process each of…

string python-2.7 unicode decode encode

asked Jan 29 '16 at 11:26

mfalade

1,647
2
17
16

votes

1 answer

How to detect when bytes can't be converted to string in Go?

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?

string unicode encoding go utf-8

asked Jan 18 '16 at 18:20

codefx

9,872
16
53
81

votes

6 answers

UTF-8 or UTF-16 or UTF-32 or UCS-2

I am designing a new CMS but want to design it to fit all my future needs like Multilingual content so i was thinking Unicode (UTF-8) is the best solution But with some search i got this article…

c# asp.net unicode

asked Aug 13 '10 at 01:37

Pola Edward

votes

4 answers

Emacs, unicode, xterm mouse escape sequences, and wide terminals

Short version: When using emacs' xterm-mouse-mode, Somebody (emacs? bash? xterm?) intercepts xterm's control sequences and replaces them with \0. This is a pain on wide monitors because only the first 223 columns have mouse. What is the culprit,…

emacs unicode utf-8 mouse xterm

asked Aug 12 '10 at 10:16

Ryan

votes

3 answers

Word wrapping in pango with mixed scripts

I have a text box implementation that uses pango. If i put a string that starts with a word in right-to-left script, followed by a space, followed by word in left-to-right based script, the word wrapping that pango uses gets messed up (using…

text unicode arabic unicode-string pango

asked Dec 09 '15 at 18:47

default

2,637
21
44

votes

3 answers

How do I get the "visible" length of a combining Unicode string in Python?

If I have a Python Unicode string that contains combining characters, len reports a value that does not correspond to the number of characters "seen". For example, if I have a string with combining overlines and underlines such as…

python python-2.7 unicode

asked Oct 26 '15 at 17:10

orome

45,163
57
202
418

votes

3 answers

how to deal with unicode in mako?

I constantly get this error using mako: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 6: ordinal not in range(128) I've told mako I'm using unicode in any possible way: mylookup = TemplateLookup( …

python unicode mako

asked Jul 26 '10 at 09:19

Giorgio Gelardi

votes

4 answers

Parsing command line arguments in a unicode C++ application

How can I parse integers passed to an application as command line arguments if the app is unicode? Unicode apps have a main like this: int _tmain(int argc, _TCHAR* argv[]) argv[?] is a wchar_t*. That means i can't use atoi. How can I convert it to…

c++ command-line unicode

asked Dec 02 '08 at 02:27

David Reis

12,701
7
36
42

Prev 1 2 3

…

99 100 Next