Questions tagged [utf-8]

UTF-8 is a multibyte character encoding of the Unicode character set, made up of one or more bytes. Unlike some other encodings such as UTF-16, the UTF-8 encoding is upward compatible with 7-bit ASCII characters, and can be processed to some degree by applications that are only aware of bytes.

Full support of UTF-8 for searching, collation, word parsing, etc, does require support of Unicode concepts such as characters, normalisation, supplementary characters, etc. Many application and OS problems with "special characters" such as accented European letters, or ideographs such as used in Japanese or Chinese, derive from mismatched character encodings.

Related tags:

104 questions
3
votes
2 answers

URL Encoding force UTF-8 in CentOS 6.4

I have a simple html website in a CentOS 6.4 server. In every html page i have set I have already added in a .htaccess file the following line IndexOptions Charset=UTF-8 and in…
segconn
  • 31
  • 1
  • 3
3
votes
1 answer

Can not input or print Chinese on PuTTY

On Red Hat Enterprise Linux AS release 3, I've set my environment variable as below $ echo $LANG zh_CN.UTF-8 $ echo $LANGUAGE zh_CN.UTF-8 $ echo $SUPPORTED en_US.UTF-8:en_US:en:zh_CN.UTF-8 $…
hetaoblog
  • 249
  • 1
  • 3
  • 14
3
votes
3 answers

Mongo Client RedHat EL5 UT8 Support

# mongo MongoDB shell version: 1.6.4 Fri Mar 16 11:55:46 *** warning: spider monkey build without utf8 support. consider rebuilding with utf8 support connecting to: test Mongo Server seems to handle the utf8 characters fine, as well as my…
Michael
  • 801
  • 1
  • 7
  • 15
3
votes
2 answers

How can I change the default encoding of a tomcat server/container?

I'm having problems with the character encoding of my webapp and would like to know how I can go about changing the default encoding of tomcat on the Linux production server to match the cp 1252 encoding of the dev server on windows (or at least…
Dark Star1
  • 1,385
  • 7
  • 22
  • 37
2
votes
0 answers

Postfix, virtual aliases and UTF8

The Postfix documentation reads By default, Postfix sets the "SMTPUTF8 requested" flag only on address verification probes and on Postfix sendmail submissions that contain UTF-8 in the sender address, UTF-8 in a recipient address, or UTF-8 in a…
user178826
2
votes
2 answers

htaccess redirect changes encoding of HTML response

I've set Apache 2.4 server to AddDefaultCharset utf-8 in httpd.conf and my .htaccess file redirects all non-www and http to https://www.example.com RewriteEngine On RewriteCond %{HTTP_HOST} ^example\.com$ [OR] RewriteCond %{HTTPS} !on RewriteRule…
user46688
  • 176
  • 1
  • 12
2
votes
1 answer

PHP: No mapping for the Unicode character..., for specific greek characters

I have a windows IIS server working with PHP. The user inserts a word via an HTML form, it goes to PHP and then PHP calls a COM dll (vb6) function passing the word to the function as a utf8 string. Everything goes fine, until an input contains…
MirrorMirror
  • 105
  • 2
  • 12
2
votes
1 answer

utf-8 filename are ruined when uploading through IIS

When I upload files with UFT-8 file name through IIS on my Windows Server file names are ruined. (They are changed as if they are encoded in ASCII and therefore they are no more accessible). I Wonder if there is any work around for this problem. I…
omidrezav
  • 21
  • 3
2
votes
3 answers

Apache2: Problems matching accented characters in query string using RewriteCond & RewriteRule

Working on a site where the plan is to move URLs from a query string format to a number based format. Lots of URLs exist that have unescaped accented & similar UTF8 characters in them. The problem? I can’t seem to get Apache2 to properly match…
Giacomo1968
  • 3,542
  • 27
  • 38
2
votes
1 answer

Utf-8 encoding for PHP-scripts creates "headers already sent"

If I upload php files that are encoded in UTF-8, instead of ANSI, it creates problems with "headers already sent" on the server. vi shows {feff} at the start and nano does not show the \ at the start. When I remove the {feff} manually in vi, it…
ujjain
  • 3,983
  • 16
  • 53
  • 91
2
votes
3 answers

`less` not able to display special characters

I stumbled upon bad special characters in some manpages: If your terminal is a "true" auto-margin terminal (it doesn▒<80><99>t allow the last position on the screen to be updated without scrolling the screen) consider using a…
d135-1r43
  • 411
  • 4
  • 13
2
votes
1 answer

Setting up trac with PostgreSQL - How to set encoding to UTF8?

So I've been looking to install trac on my debian server with PostgreSQL. I setup everything as per the docs but when trying to run trac-admin /path initenv I get this error for database encoding: DataError: character 0xe282ac of encoding "UTF8" has…
ingh.am
  • 273
  • 3
  • 15
2
votes
1 answer

DB2 Integrity Checks and Exception Tables

I am working on planning a migration of a DB2 8.1 database from a horrible IBM encoding to UTF-8 to support further languages etc. I am encountering an issue that I am stuck on. A few notes on this migration: We are using db2move to export and load…
2
votes
1 answer

UnrealIRCD codepage doesn't exists?

I am trying to make utf8 codepage for my ircd server.. unrealircd.conf listen *:9999 { codepage "UTF8"; options { ssl; clientsonly; cp_utf; }; }; I getting some errors: unrealircd.conf Rehashing - Notice…
Kirill Firsov
  • 75
  • 1
  • 5
2
votes
1 answer

Does having no dash in utf-8 in email messages makes charset unreadable?

Does having no dash in utf-8 in email messages headers can make email clients to display text wrong? Subject: Newsletter MIME-Version: 1.0 From: <> Reply-To: <> Content-Type: text/plain; **charset=utf8** Message-Id: <> Sender: www-data <> Date:…
MadBoy
  • 3,725
  • 15
  • 63
  • 94