Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

8 answers

Inheriting and overriding functions of a std::string?

Since std::string is actually a typedef of a templated class, how can I override it? I want to make a UTF-8 std::string that will return the correct length, among other things.

c++ unicode utf-8 character-encoding

asked Nov 17 '10 at 13:55

jmasterx

52,639
96
311
557

votes

1 answer

wicked_pdf shows unknown character on unicode pdf conversion (ruby)

I'm trying to create a pdf from a html page using wicked_pdf (version 1.1) and wkhtmltopdf-binary gems. My html page contains a calendar emoji that displays well in the browser whatever font I use

ruby-on-rails ruby unicode wkhtmltopdf wicked-pdf

asked Jan 10 '17 at 13:40

rico1892

votes

5 answers

How can I display a tux character in a shell script?

I realize this is very much of a long shot, but... In shell scripts on Macs I can display an Apple character. Is there any way to display a Tux character (or anything else associated with Linux) on Linux systems? The simplest solution would be if…

shell unicode special-characters

asked Oct 30 '10 at 18:21

iconoclast

21,213
15
102
138

votes

2 answers

Python: Find equivalent surrogate pair from non-BMP unicode char

The answer presented here: How to work with surrogate pairs in Python? tells you how to convert a surrogate pair, such as '\ud83d\ude4f' into a single non-BMP unicode character (the answer being "\ud83d\ude4f".encode('utf-16',…

python unicode encoding emoji surrogate-pairs

asked Oct 24 '16 at 16:13

hilssu

votes

3 answers

Why were the code points in the range of U+D800 to U+DFFF removed from the Unicode character set?

I am learning about UTF-16 encoding, and I have read that if you want to represent code points in the range of U+10000 to U+10FFFF, then you have to use surrogate pairs, which are in the range of U+D800 to U+DFFF. So let's say I want to encode the…

unicode encoding character-encoding utf-16

asked Oct 21 '16 at 20:22

paul

votes

4 answers

How to make the Java.awt.Robot type unicode characters? (Is it possible?)

We have a user provided string that may contain unicode characters, and we want the robot to type that string. How do you convert a string into keyCodes that the robot will use? How do you do it so it is also java version independant (1.3 ->…

java unicode automation

asked Dec 29 '08 at 04:07

Greg Domjan

13,943
6
43
59

votes

5 answers

Handling a Unicode String in Delphi Versions <= 2007

Background: This question relates to versions of Delphi below 2009 (ie without Unicode support built in). I have a specification that requires me to transmit a Unicode encoded string over a TCP connection but I do not have Delphi 2009. Question Is…

delphi unicode encoding

asked Dec 20 '08 at 10:55

jamiei

2,006
3
20
28

votes

5 answers

Python: any way to perform this "hybrid" split() on multi-lingual (e.g. Chinese & English) strings?

I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don't (Chinese, Japanese, Korean). Given such a string, I want to separate the English/French/etc part…

python string unicode multilingual cjk

asked Sep 27 '10 at 06:02

Continuation

12,722
20
82
106

votes

4 answers

Convert unicode json to normal json in python

I got the following json: {u'a': u'aValue', u'b': u'bValue', u'c': u'cValue'} by doing request.json in my python code. Now, I want to convert the unicode json to normal json, something which should like this: {"a": "aValue", "b": "bValue", "c":…

python json unicode

asked Apr 30 '16 at 11:46

Sanjiban Bairagya

votes

5 answers

How to Show Eastern Letter(Chinese Character) on SQL Server/SQL Reporting Services?

I need to insert chinese characters in my database but it always show ???? .. Example: Insert this record. 微波室外单元-Apple Then it became ??? Result: ??????-Apple I really Need Help...thanks in regard. I am using MSSQL Server 2008

sql-server unicode

asked Sep 10 '10 at 03:53

Crimsonland

2,194
3
24
42

votes

3 answers

In UTF-16, UTF-16BE, UTF-16LE, is the endian of UTF-16 the computer's endianness?

UTF-16 is a two-byte character encoding. Exchanging the two bytes' addresses will produce UTF-16BE and UTF-16LE. But I find the name UTF-16 encoding exists in the Ubuntu gedit text editor, as well as UTF-16BE and UTF-16LE. With a C test program I…

c unicode endianness utf-16

asked Apr 11 '16 at 13:24

hao.zhou

votes

2 answers

Where is the character encoding of a text file stored in Linux?

I know the short answer should be "nowhere", however there's something that doesn't quite add up in the following test 2. Test 1. In Gedit, I create a new file containing only the string "aàbï", I choose "Save As" and there's a selector for choosing…

linux unicode encoding utf-8

asked Mar 27 '16 at 20:08

matteo

2,934
7
44
59

votes

0 answers

How can I open a file that has a Chinese Filename in C?

For some reasons, I need to write files that have Chinese characters in the filenames(C language). For example: #include #include FILE *fout; int main(){ fout = fopen("乔布斯.txt", "w"); fprintf(fout, "Jobs"); //Do…

c file unicode

asked Jan 26 '16 at 09:27

Jeffchan

votes

3 answers

What are the most difficult-to-render Unicode samples?

I'm trying to implement a cross-platform (desktop browsers, iOS, & Android) typography system that allows users to input any Unicode string. What are some strings I should use to stress-test my system and ensure the most nines of users will have a…

unicode stress-testing text-rendering

asked Dec 30 '15 at 22:44

Ky -

30,724
51
192
308

votes

2 answers

Python CSV write to file unreadable in Excel (Chinese characters)

I am trying to performing text analysis on Chinese texts. The program is provided below. I got the result with unreadable characters such as 浜烘皯鏃ユ姤绀捐. And if I change the output file result.csv to result.txt, the characters are correct as 人民日报社论. So…

python excel csv unicode

asked Dec 27 '15 at 15:06

flyingmouse

1,014
3
13
29

Prev 1 2 3

…

99 100 Next