Questions tagged [unicode]

Unicode is intended to be a universal character set for describing all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

Related tags

45 questions
3
votes
1 answer

Linux support for unicode filenames

I have a couple Linux fileserver running Samba, what do I need to do to support filenames with unicode characters? Do particular filesystem have better support for Unicode? Would I get better support by using something other then ext3? What do I…
Zoredache
  • 130,897
  • 41
  • 276
  • 420
2
votes
1 answer

PHP: No mapping for the Unicode character..., for specific greek characters

I have a windows IIS server working with PHP. The user inserts a word via an HTML form, it goes to PHP and then PHP calls a COM dll (vb6) function passing the word to the function as a utf8 string. Everything goes fine, until an input contains…
MirrorMirror
  • 105
  • 2
  • 12
2
votes
1 answer

MS-DOS attrib +s 'path not found'

I have to put some specific icons on folders and I had success doing that over all folders except the one that have some special characters. I'm using windows7. How it is done? creating a .ini file within the folder; running a command-line…
2
votes
2 answers

Imapsync doesn't sync unicode characters properly

I am using imapsync to migrate my e-mail account from a CPanel (courier) mailserver to a Debian/Dovecot one. The issue I am facing has to do with some folders that contain Unicode characters. For example a mailbox folder containing Greek characters…
lefterav
  • 233
  • 2
  • 8
2
votes
2 answers

Re-uploading file with unicode filename creates identical duplicate

I'm running a Django site on a Debian 6 system, with a gunicorn server and nginx 0.7.67 handling static files. The filesystem locale is set to sv_SE.UTF-8. I got a problem where another user uploaded a file with a filename containing unicode…
Samuel Linde
  • 51
  • 1
  • 4
2
votes
1 answer

Logcheck: wildcards which include non-latin characters

On my mail server, I have a custom logcheck rule as follows, which is intended to filter messages from deliver: ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ deliver\.*\): msgid=.*: saved mail to.*$ Unfortunately, the msgid=.* wildcard does not match if there…
Matt Holgate
  • 131
  • 1
1
vote
0 answers

Hosting emoji domains quandary

I am trying to make my IIS8.5 to process IDN domains, including the domains with emoji characters. Since the application runs dynamic hostnames I bind by IP and protocol only. The IIS8.5 does fine job to do IDN and Punycode reconstruction but only…
Romy
  • 11
  • 2
1
vote
0 answers

Ubuntu Bacula backup chinese (unicode) files

My bacula 5.0.2 on Ubuntu 10.04 LTS is unable to backup chinese/unicode text files. Not a permission issue because english text files like test.txt have no issues. Does anyone have any idea how? The error message is Could not stat…
JamesLee
  • 11
  • 1
1
vote
0 answers

Linux unicode/umlauts in URL

We have a website, where some pictures name using unicode, e.g. wildkräuter2_big.jpg. Problem is - when anybody trying to access it - Apache 2.4 returns a 404 error: $ curl -r 0-99…
setevoy
  • 334
  • 2
  • 4
  • 15
1
vote
1 answer

gunicorn behave differently when run from terminal and by service

I am using nginx+gunicorn+django for my website. The following is my gunicorn.conf file: description "Gunicorn daemon for Django project" start on (local-filesystems and net-device-up IFACE=eth0) stop on runlevel [!12345] # If the process quits…
moonkey
  • 113
  • 4
1
vote
1 answer

Is there a way to force PostgreSQL to use the UTF-7 encoding for the connection?

I have a PostgreSQL database which uses an arcane text encoding, and I can't change that. Is there a way tostore text in UTF-7 transparently (for the informed client) so that the database engine doesn't complain that it cannot convert Unicode…
Alexei Averchenko
  • 261
  • 1
  • 2
  • 7
1
vote
1 answer

Problems with vim/locale as non-root user on Solaris

I do some work on a Solaris 10 machine, and my .vimrc is set up to show unicode characters for tabs and line endings: set listchars=tab:▸\ ,eol:¬ This works out of the box on my OS X machine. On Linux as well as Solaris I get the following error…
Lyle
  • 111
  • 4
1
vote
2 answers

Using Chinese Characters With Mod_Rewrite

I'm trying to create a rule using Chinese characters #RewriteRule ^zh(.*) /中文版$1 [L,R=301] creates error 500 when i change the file to UTF-8 #RewriteRule ^zh(.*) /%E4%B8%AD%E6%96%87%E7%89%88$1 [L,R=301] redirects to…
Moak
  • 734
  • 3
  • 10
  • 31
1
vote
2 answers

Unicode versions of common UNIX text tools to run on Windows

I can't afford the MKS toolkit (also think its a bit overkill for my needs). So I was wondering is anyone knew of the standard set of unix text tools but that can handle unicode files under Windows. Examples are wc, awk, diff, sort etc.
kingchris
  • 244
  • 4
  • 13
1
vote
0 answers

IIS accepting requests with zero-width characters

We're running an API behind a pair of load-balanced IIS v10 servers that route the request to a pair of Tomcat servers (historical reasons for this architecture). A couple of instances have come up when a request is coming through where the URI…