Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

3 answers

How to make MySQL aware of multi-byte characters in LIKE and REGEXP?

I have a MySQL table with two columns, both utf8_unicode_ci collated. It contains the following rows. Except for ASCII, the second field also contains Unicode codepoints like U+02C8 (MODIFIED LETTER VERTICAL LINE) and U+02D0 (MODIFIED LETTER…

mysql sql unicode utf-8 character-encoding

asked Jun 26 '11 at 06:39

Tim

13,904
10
69
101

votes

2 answers

How can I audit my Windows application for correct Unicode handling?

I can't use prepackaged Unicode string libraries, such as ICU, because they blow up the size of the binary to an insane degree (it's a 200k program; ICU is 16MB+!). I'm using the builtin wchar_t string type for everything already, but I want to…

c++ winapi unicode

asked Jun 20 '11 at 15:42

Billy ONeal

104,103
58
317
552

votes

4 answers

If UTF-8 is an 8-bit encoding, why does it need 1-4 bytes?

On the Unicode site it's written that UTF-8 can be represented by 1-4 bytes. As I understand from this question https://softwareengineering.stackexchange.com/questions/77758/why-are-there-multiple-unicode-encodings UTF-8 is an 8-bits encoding. So,…

unicode encoding utf-8

asked Jun 14 '11 at 04:07

Sergey

11,548
24
76
113

votes

5 answers

What's the point of String.normalize()?

While reviewing JavaScript concepts, I found String.normalize(). This is not something that shows up in W3School's "JavaScript String Reference", and, hence, it is the reason I might have missed it before. I found more information about it in…

javascript string unicode normalization

asked Jul 21 '20 at 11:30

Tiago Martins Peres

14,289
18
86
145

votes

2 answers

Cyrillic alphabet validation

I came across an interesting defect today the issue is I have a deployment of my web application in Russia and the name value "Наталья" is not returning true as alphaNumeric in the method below. Curious for some input on how people would approach a…

java regex validation unicode internationalization

asked Jun 06 '11 at 19:09

Duncan Krebs

3,366
2
33
53

votes

2 answers

How to correctly display unicode characters in VS Code's Integrated Terminal?

As per title, I can't seem to get VS Code Integrated Terminal to correctly display unicode characters. They always show up as question marks (?) in the integrated terminal. I've ensured that the files are saved with encoding UTF-8 which seemed to…

java unicode visual-studio-code

asked Aug 22 '19 at 15:15

Sheng Ying

votes

2 answers

Proper Way to Insert Strings to a SQLAlchemy Unicode Column

I have a SQLAlchemy model with a Unicode column. I sometimes insert unicode values to it (u'Value'), but also sometimes insert ASCII strings. What is the best way to go about this? When I insert ASCII strings with special characters I get this…

python unicode sqlalchemy insertion

asked Apr 20 '11 at 18:57

Raiders

votes

2 answers

What is the difference between "combining characters" and "modifier letters"?

In the Unicode standard, there are diacritical marks, such as U+0302, COMBINING CIRCUMFLEX ACCENT (◌̂), and U+02C6, MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ). I know that combining characters are combined with the previous letter to, say, make a letter…

unicode character

asked Jan 30 '19 at 22:57

Greg

votes

2 answers

Unicode identifiers (function names) for non-localization purposes advisable?

PHP allows Unicode identifiers for variables, functions, classes and constants anyhow. It was certainly intended for localized applications. Wether it's a good idea to code an API in anything but English is debatable, but it's undisputed that some…

php unicode identifier

asked Mar 18 '11 at 23:32

mario

144,265
20
237
291

votes

4 answers

A resilient, actually working CSV implementation for non-ascii?

[Update] Appreciate the answers and input all around, but working code would be most welcome. If you can supply code that can read the sample files you are king (or queen). [Update 2] Thanks for the excellent answers and discussion. What I need to…

python unicode encoding

asked Feb 16 '11 at 18:19

Parand

102,950
48
151
186

votes

1 answer

What is the efficient, standards-compliant mechanism for processing Unicode using C++17?

Short version: If I wanted to write program that can efficiently perform operations with Unicode characters, being able to input and output files in UTF-8 or UTF-16 encodings. What is the appropriate way to do this with C++? Long version: C++…

c++ unicode encoding locale utf

asked Feb 15 '18 at 21:46

Poeta Kodu

1,120
8
16

votes

7 answers

Is there an STL string class that properly handles Unicode?

I know all about std::string and std::wstring but they don't seem to fully pay attention to extended character encoding of UTF-8 and UTF-16 (On windows at least). There is also no support for UTF-32. So does anyone know of cross-platform drop-in…

c++ unicode stl unicode-string

asked Feb 01 '11 at 11:28

Goz

61,365
24
124
204

votes

4 answers

Windows Console and Qt Unicode Text

I spent a whole day trying to figure this out with no luck. I looked Everywhere but no luck with working code. OS: Win XP Sp2 IDE & FRAMEWORK: C++, Qt Creator 2.0. I am trying to output some unicode (UTF-8) text to the windows console but all I see…

qt unicode console

asked Jan 22 '11 at 05:14

user440297

1,181
4
23
33

votes

3 answers

python: unicode problem

I am trying to decode a string I took from file: file = open ("./Downloads/lamp-post.csv", 'r') data =…

python unicode

asked Jan 19 '11 at 13:06

Oleg Tarasenko

9,324
18
73
102

votes

3 answers

Detect if character is simplified or traditional Chinese character

I found this question which gives me the ability to check if a string contains a Chinese character. I'm not sure if the unicode ranges are correct but they seem to return false for Japanese and Korean and true for Chinese. What it doesn't do is tell…

unicode cjk

asked Jan 06 '11 at 20:28

thenengah

42,557
33
113
157

Prev 1 2 3

…

99 100 Next