In Python I can print a unicode character by name (e.g. print(u'\N{snowman}')
). Is there a way I get get a list of all valid names?

- 13,428
- 4
- 37
- 49
-
7That'd be the *whole Unicode standard*. – Martijn Pieters May 18 '15 at 12:06
-
1Why do you ask this question? – Mike Graham May 18 '15 at 12:16
-
@MikeGraham Want to play a little game with my students. – Miki Tebeka May 18 '15 at 12:17
-
Beware that if they have a different version of Python, the game may backfire on you: see [Martijn Pieters' answer below](http://stackoverflow.com/a/30302840/2564301). – Jongware May 18 '15 at 12:21
7 Answers
Every codepoint has a name, so you are effectively asking for the Unicode standard list of codepoint names (as well as the *list of name aliases, supported by Python 3.3 and up).
Each Python version supports a specific version of the Unicode standard; the unicodedata.unidata_version
attribute tells you which one for a given Python runtime. The above links lead to the latest published Unicode version, replace UCD/latest
in the URLs with the value of unicodedata.unidata_version
for your Python version.
Per codepoint, the unicodedata.name()
function can tell you the official name, and unicodedata.lookup()
gives you the inverse (name to codepoint).

- 1,048,767
- 296
- 4,058
- 3,343
-
Are functions `name` and `lookup` really inverse? Indeed, `name(lookup('space'))` returns `SPACE`. But `lookup('escape')` returns expected value and `name(lookup('escape'))` raises `ValueError: no such name`. – Jeyekomon Jul 28 '22 at 09:25
-
1@Jeyekomon not all Unicode codepoints have a name; `escape` is an alias instead. `lookup()` takes names and aliases (and sequences) but `name()` only ever returns the official name. It’s mostly the control codes like escape that don’t have a name. Note that `space` is an alias, names are always uppercase. Wikipedia has a [nice overview of what doesn’t have a name](https://en.wikipedia.org/wiki/Unicode_character_property#Name). – Martijn Pieters Aug 13 '22 at 11:45
If you want a list of all unicode character names, consider downloading the Unicode Character Database.
It is included in the base repositories of many linux distributions (ex. "unicode-ucd" on RHEL).
The package includes NamesList.txt, which contains the exhaustive list of unicode character names.
Caution: NamesList.txt
need some times to be downloaded (size > 1.5 MB).
Example:
21FE RIGHTWARDS OPEN-HEADED ARROW
21FF LEFT RIGHT OPEN-HEADED ARROW
@@ 2200 Mathematical Operators 22FF
@@+
@ Miscellaneous mathematical symbols
2200 FOR ALL
= universal quantifier
2201 COMPLEMENT
x (latin letter stretched c - 0297)
2202 PARTIAL DIFFERENTIAL
2203 THERE EXISTS
= existential quantifier
2204 THERE DOES NOT EXIST
: 2203 0338
2205 EMPTY SET
= null set
* used in linguistics to indicate a null morpheme or phonological "zero"
x (latin capital letter o with stroke - 00D8)
x (diameter sign - 2300)
~ 2205 FE00 zero with long diagonal stroke overlay form

- 3,387
- 5
- 37
- 50

- 3,334
- 2
- 26
- 42
Yes there is a way. Going through all existing code points and calling unicodedata.name()
on each of them. Like this:
names = []
for c in range(0, 0x10FFFF + 1):
try:
names.append(unicodedata.name(c))
except KeyError:
pass
# Do something with names

- 2,208
- 1
- 22
- 23
-
At least in Python 3, it should be `except ValueError` instead of `except KeyError`. https://docs.python.org/3/library/unicodedata.html#unicodedata.name – Dominique Unruh Jun 02 '22 at 12:25
For a given codepoint, you can use unicodedata.name
. To get them all, you can work through all the billions to see which have such names.

- 73,987
- 14
- 101
- 130
-
3Not billions. The standard isn't **that** big. Yet. Unicode 7.0 contains 112,804. – Martijn Pieters May 18 '15 at 12:08
-
2There aren't billions of names, but there are billions of potential codepoints to work through and check if we march through naively. – Mike Graham May 18 '15 at 12:11
-
8There are (and forever will be) exactly 1,114,112 potential code points. You'd have to be extremely naïve to walk the entire 32-bit space. – 一二三 May 18 '15 at 13:15
Just print them all:
import unicodedata
for i in range(0x110000):
character = chr(i)
name = unicodedata.name(character, "")
if len(name) > 0:
print(f"{i:6} | 0x{i:04X} | {character} | {name}")

- 250
- 2
- 9
If you want to insert a unicode character by name, but don't know the name. Here is how you get an easy overview of unicode character names.
On Windows
- Open "Character Map" (search for charmap.exe and run it).
- Select any common Microsoft font (these tend to have a wide variety of unicode characters defined).
- Click on any character on the map to get its Unicode Character Name.
On Mac it's called "Character Palette" and found under System Preferences, "International -> Input" or "Language & Text -> Input Sources" by ticking the box next to "Character Palette".

- 71
- 1
- 5
my one liner, just for my own reference ;p
import unicodedata
names = [unicodedata.name(chr(c)) for c in range(0, 0x10FFFF+1) if unicodedata.name(chr(c), None)]

- 5,651
- 3
- 22
- 37