Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

6 answers

Writing utf16 to file in binary mode

I'm trying to write a wstring to file with ofstream in binary mode, but I think I'm doing something wrong. This is what I've tried: ofstream outFile("test.txt", std::ios::out | std::ios::binary); wstring hello = L"hello"; outFile.write((char *)…

c++ unicode utf-16

asked Oct 16 '08 at 07:17

Cactuar

votes

5 answers

PHP function imagettftext() and unicode

I'm using the PHP function imagettftext() to convert text into a GIF image. The text I am converting has Unicode characters including Japanese. Everything works fine on my local machine (Ubuntu 7.10), but on my webhost server, the Japanese…

php unicode gd

asked Oct 13 '08 at 15:33

gerdemb

11,275
17
65
73

votes

2 answers

How can I replace UTF-8 errors in Ruby without converting to a different encoding?

In order to convert a string to UTF-8 and replace all encoding errors, you can do: str.encode('utf-8', :invalid=>:replace) The only problem with this is it doesn't work if str is already UTF-8, in which case any errors remain: irb> x =…

ruby string unicode encoding utf-8

asked Oct 03 '13 at 16:21

Matt

21,026
18
63
115

votes

3 answers

strange UnicodeDecodeError on django

Was doing a fresh install of my vagrant box and my dev environment and when trying to run my django project I get the following error. Any ideas whats going on? ---------------------------------------- [21/Sep/2013 23:44:03] code 400, message Bad…

python django http unicode

asked Sep 22 '13 at 04:48

jzkelter

votes

1 answer

Set up Notepad++ and NppExec to print unicode characters from python

I have an utf-8 encoded file cjk.py: print("打印") Unsurprisingly, running python cjk.py yields Traceback (most recent call last): File "cjk.py", line 1, in print('\u6253\u5370') File "C:\Python33\lib\encodings\cp850.py", line 19, in…

python unicode notepad++ nppexec

asked Aug 23 '13 at 13:50

Clément

12,299
15
75
115

votes

3 answers

UnicodeDecodeError: unexpected end of data

I have a huge text file which I want to open. I'm reading the file in chunks, avoiding memory issues related to reading too much of the file all at once. code snippet: def open_delimited(fileName, args): with open(fileName, args,…

unicode python-3.x

asked Aug 21 '13 at 12:39

Presen

1,809
4
31
46

votes

7 answers

The encoding 'UTF-8' is not supported by the Java runtime

Whenever I start our Apache Felix (OSGi) based application under SUN Java ( build 1.6.0_10-rc2-b32 and other 1.6.x builds) I see the following message output on the console (usually under Ubuntu 8.4): Warning: The encoding 'UTF-8' is not supported…

java linux unicode

asked Oct 07 '08 at 04:34

Mark Derricutt

votes

1 answer

How to convert a char to its full Unicode name?

I need functions to convert between a character (e.g. 'α') and its full Unicode name (e.g. "GREEK SMALL LETTER ALPHA") in both directions. The solution I came up with is to perform a lookup in the official Unicode Standard available online:…

c# .net string unicode

asked Jun 25 '13 at 19:04

Oksana Gimmel

votes

2 answers

How to query MySQL for fields containing null characters

I have a MySQL table with a text column. Some rows have null characters (0x00) as part of this text column (along with other characters). I am looking for a query that will return all rows containing any null characters for this column, but I…

mysql unicode character-encoding escaping

asked Jun 08 '13 at 00:42

CJS

1,455
1
13
17

votes

2 answers

Ruby's String#gsub, unicode, and non-word characters

As part of a larger series of operations, I'm trying to take tokenized chunks of a larger string and get rid of punctuation, non-word gobbledygook, etc. My initial attempt used String#gsub and the \W regexp character class, like so: my_str =…

ruby regex unicode

asked Oct 26 '09 at 22:42

Steven Bedrick

votes

2 answers

Unicode Encoding and decoding issues in QRCode

I am trying to generate UTF-8 QRCode so that I can encore accents and Unicode characters. To test it, I am using many decoding solution : http://zxing.org/w/decode.jspx - The zxing project also used in…

unicode encoding character-encoding decoding qr-code

asked Oct 23 '09 at 08:23

Natim

17,274
23
92
150

votes

2 answers

Allowed characters in CSS 'content' property?

I've read that we must use Unicode values inside the content CSS property i.e. \ followed by the special character's hexadecimal number. But what characters, other than alphanumerics, are actually allowed to be placed as is in the value of content…

unicode css

asked Mar 26 '13 at 17:37

its_me

10,998
25
82
130

votes

4 answers

Java Can't Open a File with Surrogate Unicode Values in the Filename?

I'm dealing with code that does various IO operations with files, and I want to make it able to deal with international filenames. I'm working on a Mac with Java 1.5, and if a filename contains Unicode characters that require surrogates, the JVM…

java file unicode filenames surrogate-pairs

asked Oct 09 '09 at 19:21

Bear

votes

3 answers

Why does Java use modified UTF-8 instead of UTF-8?

Why does Java use modified UTF-8 rather than standard UTF-8 for object serialization and JNI? One possible explanation is that modified UTF-8 can't have embedded null characters and therefore one can use functions that operate on null-terminated…

java unicode utf-8 java-native-interface

asked Mar 15 '13 at 19:26

vitaut

49,672
25
199
336

votes

1 answer

Issue about 65533 � in C# text file reading

I created a sample app to load all special characters while copy pasting from Openoffice writer to Notepad. Double codes differs and when I try to load this. var lines = File.ReadAllLines("..\\ter34.txt"); This creates problem of 65533 Issue comes…

c# unicode

asked Feb 22 '13 at 10:42

Aravind Srinivas

Prev 1 2 3

…

100