Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

U+0041 A
U+0042 B
U+0043 C
...
U+039B Λ
U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

UTF FAQ, UTF-16 FAQ, UTF-8 FAQ

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Latest Version of the Standard

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

Related Tags

24916 questions

votes

3 answers

Regex to Match Horizontal White Spaces

I need a regex in Python2 to match only horizontal white spaces not newlines. \s matches all whitespaces including newlines. >>> re.sub(r"\s", "", "line 1.\nline 2\n") 'line1.line2' \h does not work at all. >>> re.sub(r"\h", "", "line 1.\nline…

regex python-2.7 unicode python-unicode

asked Sep 07 '17 at 12:14

Memduh

votes

4 answers

Convert Array of UnicodeScalar into String in Swift

I have an array of unicode scalars (Type is [UnicodeScalar]) like: let array = [UnicodeScalar("f")!, UnicodeScalar("o")!, UnicodeScalar("o")!] or let array2 = "bar".unicodeScalars how can I convert efficiently these arrays into a strings again?…

swift string unicode

asked Jul 25 '17 at 05:00

nacho4d

43,720
45
157
240

votes

3 answers

Is it actually possible to store and process individual UTF-8 characters on C ? If so, how?

I've written a program in C that breaks words down into syllables, segments and letters. It's working well with ASCII characters but I want to make versions that work for the IPA and Arabic too. I'm having massive problems saving and performing…

c unicode wchar

asked Jun 06 '17 at 19:26

sally2000

votes

2 answers

Devanagari text rendering improperly in PyGame

We have a small web app that we want to convert into something native. Right now, it's got a lot of moving parts (the backend, the browser etc.) and we'd like to convert it into a single tight application. We decided to use PyGame to do this and…

python unicode pygame text-rendering devanagari

asked May 30 '17 at 05:33

Noufal Ibrahim

71,383
13
135
169

votes

3 answers

Unicode with knitr and Rmarkdown

Is there a set of best practices or documentation for working with Unicode in knitr and Rmarkdown? I can't seem to get any glyphs to show up properly when knitting a document. For example, this works in the console (in Rstudio): > cat("\U2660 …

r unicode knitr r-markdown

asked May 24 '17 at 08:31

user2987808

1,387
1
12
28

votes

2 answers

Printing a Unicode Symbol in C

I'm trying to print a unicode star character (0x2605) in a linux terminal using C. I've followed the syntax suggested by other answers on the site, but I'm not getting an output: #include #include int main(){ wchar_t star =…

c unicode ncurses

asked May 07 '17 at 17:07

Luke Collins

1,433
3
18
36

votes

2 answers

Why is the output of print in python2 and python3 different with the same string?

In python2: $ python2 -c 'print "\x08\x04\x87\x18"' | hexdump -C 00000000 08 04 87 18 0a |.....| 00000005 In python3: $ python3 -c 'print("\x08\x04\x87\x18")' | hexdump -C 00000000 08 04 c2 87 18 0a …

python unicode utf-8

asked Mar 19 '17 at 07:58

lzutao

votes

4 answers

Why isn't the Byte Order Mark emitted from UTF8Encoding.GetBytes?

The snippet says it all :-) UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/); byte[] data = enc.GetBytes("a"); // data has length 1. // I expected the BOM to be included. What's up?

c# .net unicode encoding utf-8

asked Jan 07 '09 at 16:00

xyz

27,223
29
105
125

votes

4 answers

How do I paste non-ASCII characters into vim?

My terminal emulator is configured for Unicode character encoding and my .vimrc contains the line set encoding=utf-8 but when I try pasting the word "café" into vim, it comes out as "cafÃ©". I can make an "é" in vim by typing Ctrl-vu followed by…

vim unicode encoding utf-8 character-encoding

asked Nov 15 '10 at 15:11

sferik

1,795
2
15
22

votes

4 answers

Characters appear as question marks in MySQL

I have a problem saving unicode characters in MySql. Initially my database character set was set to latin1 and unicode strings were saves as quotation marks. After doing some research I added the following lines to…

mysql unicode utf8mb4

asked Dec 08 '16 at 16:12

yinjia

votes

6 answers

Delphi WideString and Delphi 2009+

I am writing a class that will save wide strings to a binary file. I'm using Delphi 2005 for this but the app will later be ported to Delphi 2010. I'm feeling very unsure here, can someone confirm that: A Delphi 2005 WideString is exactly the same…

delphi unicode delphi-2010

asked Nov 04 '10 at 12:34

David

votes

7 answers

findstr or grep that autodetects chararacter encoding (UTF-16)

I want to do this: findstr /s /c:some-symbol * or the grep equivalent grep -R some-symbol * but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark…

unicode windows-xp windows-vista utf-16 findstr

asked Jan 02 '09 at 21:28

David Martin

votes

2 answers

Sequence of logical OR in ES6/Unicode regular expression in Chrome ✗ vs Firefox ✓

Consider the following Unicode-heavy regular expression (emoji standing in for non-ASCII and extra-BMP characters): ''.match(/||/ug) Firefox returns [ "", "", "", "", "", "" ] . Chrome 52.0.2743.116 and Node 6.4.0 both return null! It doesn’t seem…

javascript regex node.js google-chrome unicode

asked Aug 25 '16 at 18:41

Ahmed Fasih

6,458
7
54
95

votes

1 answer

Matching Unicode word boundaries in Python

In order to match the Unicode word boundaries [as defined in the Annex #29] in Python, I have been using the regex package with flags regex.WORD | regex.V1 (regex.UNICODE should be default since the pattern is a Unicode string) in the following…

python regex python-3.x unicode

asked Aug 24 '16 at 20:45

ewcz

12,819
1
25
47

votes

4 answers

Unable to translate bytes [FC] at index 35 from specified code page to Unicode

I'm trying to send an object like this to my REST API(built with asp net core) { "firstName":"tersü", "lastName":"asda" } And this is how the headers form SoapUI look: Accept-Encoding: gzip,deflate Content-Type:…

asp.net .net unicode asp.net-core asp.net-core-mvc

asked Aug 22 '16 at 09:41

DVM

1,229
3
16
22

Prev 1 2 3

…

99 100 Next