Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

6 answers

Delphi XE - should I use String or AnsiString?

I finally upgraded to Delphi XE. I have a library of units where I use strings to store plain ANSI characters (chars between A and U). I am 101% sure that I will never ever use UNICODE characters in those places. I want to convert all other…

delphi unicode

asked May 18 '11 at 18:47

Gabriel

20,797
27
159
293

votes

4 answers

Java regex always fails

I have a Java regex pattern and a sentence I'd like to completely match, but for some sentencecs it erroneously fails. Why is this? (for simplicity, I won't use my complex regex, but just ".*") System.out.println(Pattern.matches(".*",…

java regex unicode

asked May 12 '11 at 12:46

Zom-B

votes

4 answers

Are you fluent in Unicode yet?

Almost 5 years ago Joel Spolsky wrote this article, "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)". Like many, I read it carefully, realizing it was high-time I got to…

language-agnostic unicode internationalization ascii

asked Sep 12 '08 at 14:21

Ash

60,973
31
151
169

votes

4 answers

Read a file with unicode characters

I have an asp.net c# page and am trying to read a file that has the following charater ’ and convert it to '. (From slanted apostrophe to apostrophe). FileInfo fileinfo = new FileInfo(FileLocation); string content =…

c# asp.net unicode

asked Apr 27 '11 at 00:46

chris

3,783
3
17
13

votes

5 answers

Help me understand why Unicode only works sometimes with Python

Here's a little program: #!/usr/bin/env python # -*- encoding: utf-8 -*- print('abcd kΩ ☠ °C √Hz µF ü ☃ ♥') print(u'abcd kΩ ☠ °C √Hz µF ü ☃ ♥') On Ubuntu, Gnome terminal, IPython does what I would expect: In [6]: run Unicodetest.py abcd kΩ ☠ °C…

python unicode windows-7 ubuntu ipython

asked Apr 17 '11 at 18:15

endolith

25,479
34
128
192

votes

5 answers

Japanese/chinese email addresses?

I'm making some site which must be fully unicode. Database etc are working, i only have some small logic error. Im testing my register form with ajax if fields are valid, in email field i check with regular expressions. However if a user has a email…

php regex unicode

asked Apr 17 '11 at 15:39

Writecoder

votes

2 answers

Why isn't there a "Medium Small Black Circle" in Unicode

I know this is maybe off-topic on SO, but I don't know where else to ask. The Unicode blocks Miscellaneous Symbols and Miscellanous Symbols and Arrows contain these characters: HEAVY LARGE CIRCLE (U+2B55) ⭕ (before emojis it used to look like…

unicode

asked Jun 11 '19 at 09:07

m93a

8,866
9
40
58

votes

2 answers

Javascript unicode (greek) regular expressions

I would like to use this regular expression new RegExp("\b"+pat+"\b") in greek text but the "\b" metacharacter supports only ASCII characters. I tried XregExp library but i didnt manage to solve the issue. Any suggestions would be greatly…

javascript regex unicode character-properties xregexp

asked Apr 13 '11 at 13:33

kylito

votes

5 answers

Get Unicode characters with charcode values greater hex `FFFF`

Issue The ChrW charcode argument is a Long that identifies a character, but doesn't allow values greater than 65535 (hex value &HFFFF) - see MS Help. For instance Miscellaneous symbols and pictographs can be found at Unicode hex block 1F300-1F5FF.…

excel vba unicode

asked May 06 '19 at 15:25

T.M.

9,436
3
33
57

votes

3 answers

How to Convert a javascript object to utf-8 Blob for download?

I've been trying to find a solution that works but couldn't find one. I have an object in javascript and it has some non-english characters in it. I'm trying the following code to convert the object to a blob for download. When I click to download…

javascript json unicode encoding blob

asked Dec 26 '18 at 07:54

Loves2Develop

votes

5 answers

How to parse unicode strings with minidom?

I'm trying to parse a bunch of xml files with the library xml.dom.minidom, to extract some data and put it in a text file. Most of the XMLs go well, but for some of them I get the following error when calling…

python unicode minidom

asked Mar 16 '11 at 18:02

dariopy

votes

2 answers

Using unicode characters as shape

I'd like to use unicode characters as the shape of plots in ggplot, but for unknown reason they're not rendering. I did find a similar query here, but I can't make the example there work either. Any clues as to why? Note that I don't want to use…

r ggplot2 unicode

asked Oct 20 '18 at 06:14

Laserhedvig

votes

2 answers

R write.csv with UTF-16 encoding

I'm having trouble outputting a data.frame using write.csv using UTF-16 character encoding. Background: I am trying to write out a CSV file from a data.frame for use in Excel. Excel Mac 2011 seems to dislike UTF-8 (if I specify UTF-8 during text…

r unicode csv character-encoding utf-16

asked Mar 10 '11 at 23:15

Daniel Dickison

21,832
13
69
89

votes

3 answers

Using middle-dot ASCII with proper support?

I'm using the middle dot - · - a lot in my website. The ASCII is ·, which works fine. However, there are still some problems with some users not seeing the symbol. Is there a very close but more widely supported symbol like this, or is there a…

html unicode ascii special-characters

asked Mar 04 '11 at 21:06

AKor

8,550
27
82
136

votes

5 answers

How do you match accented and tilde characters in a perl regular expression (regexp)?

A user enters a set of names with accents and tildes: Renato Núñez, David DeJesús, and Edwin Encarnación My database has anglicized names for these people @names = ('Renato Nunez','David DeJesus','Edwin Encarnacion'); I wish to do a regexp match…

regex perl unicode localization

asked Mar 01 '11 at 16:15

Sean

Prev 1 2 3

…

99 100 Next