Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
9
votes
1 answer

Which code set is /etc/passwd stored in? Can it be UTF-8? What limits are placed on user names?

On a modern Unix or Linux system, how can you tell which code set the /etc/passwd file stores user names in? Are user names allowed to contain accented characters (from the range 0x80..0xFF in, say, ISO 8859-1 or 8859-15)? Can the /etc/passwd file…
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
9
votes
2 answers

Are IRIs valid as HTML attribute values?

Is it valid HTML to use IRIs containing non-ASCII characters as attribute values (e.g. for href attributes) instead of URIs? Are there any differences among the HTML flavors (HTML and XHTML, 4 and 5)? At least RFC 3986 seems to imply that it…
lxgr
  • 3,719
  • 7
  • 31
  • 46
9
votes
1 answer

php exec() in unicode mode?

I need to execute command line commands and tools that accept ut8 as input or generate an ut8 output. So i use cmd an it works, but when i try this from php with exec it doesn't work. To make it simple i tried simple output redirection. When i write…
code_angel
  • 1,537
  • 1
  • 11
  • 21
9
votes
1 answer

UTF-8 Filenames return Not Found in linux terminal

I have a problem with some files in linux (Ubuntu) terminal, with accents in the names. For example: $ ls dir/ criação.png So, the terminal returns that file, so it exists. Now lets look if the file exists, with this simple command: $ [ -f…
9
votes
2 answers

Java UTF-8 filenames with IBM JVM (AIX)

I'm having trouble understanding the way the IBM JVM's implementation of java.io.File deals with UTF-8 on AIX on the JFS2 filesystem. I suspect there's a system property that I'm overlooking, but I have not yet been able to find it. Let's assume I…
Edward Thomson
  • 74,857
  • 14
  • 158
  • 187
9
votes
4 answers

HTML5 page language, direction and encoding

What is the correct way of declaring a HTML5 page to be in Hebrew, RTL and utf-8 encoded? I haven't done it in a while, but I remember that in HTML4 it involved 3 or 4 tags and attributes that seemed redundant. Is it still the same?
Baruch
  • 20,590
  • 28
  • 126
  • 201
9
votes
4 answers

Vertical tmux borders dashed only when using iTerm

At my new job I'll need to use a mac, and I'm trying to use tmux with iTerm version 2. While horizontal borders appear to be displayed with the proper ACS box-drawing characters[1], the vertical borders are dashed. This is not a problem in…
Tammer Ibrahim
  • 193
  • 1
  • 5
9
votes
3 answers

How to replace all non-alphabetic characters with UTF-8 support in PHP

I want to remove all non-alphabetic character from a string. The problem is that I don't know the letter range because it is UTF8 string. It can be ENGLISH, ՀԱՅԵՐԵՆ, ქართული, УКРАЇНСЬКИЙ, РУССКИЙ I usually do something like this: $str =…
Mirko Akov
  • 2,357
  • 19
  • 19
9
votes
1 answer

PHP, convert UTF-8 to ASCII 8-bit

I'm trying to convert a string from UTF-8 to ASCII 8-bit by using the iconv function. The string is meant to be imported into an accounting software (some basic instructions parsed accordingly to SIE standards). What I'm running now: iconv("UTF-8",…
Daniel
  • 3,726
  • 4
  • 26
  • 49
9
votes
3 answers

substr with japanese characters issue

i'm echoing japanese characters fine but when i try to substr and echo out part of the string it just turn to question marks ��� note: i set my header to utf-8 header('Content-Type: text/html; charset=utf-8'); and made the meta
9
votes
3 answers

Optimal MySQL-configuration (my.cnf)

The following is my default production MySQL configuration file (my.cnf) for a pure UTF-8 setup with InnoDB as the default storage…
knorv
  • 49,059
  • 74
  • 210
  • 294
9
votes
4 answers

Junk character removal in java

In text field if i copy from word , junk character get inserted. While posting parameters from jsp page it remains fine. But while getting the parameter in java it converts into junk. I have used the following code to eliminate junk before…
user1199657
9
votes
5 answers

How to force XPath to use UTF8?

I have an XHTML document being passed to a PHP app via Greasemonkey AJAX. The PHP app uses UTF8. If I output the POST content straight back to a textarea in the AJAX receiving div, everything is still properly encoded in UTF8. When I try to parse…
Gordon
  • 1,844
  • 4
  • 17
  • 32
9
votes
4 answers

Alternative XML parser for ElementTree to ease UTF-8 woes?

I am parsing some XML with the elementtree.parse() function. It works, except for some utf-8 characters(single byte character above 128). I see that the default parser is XMLTreeBuilder which is based on expat. Is there an alternative parser that…
Kekoa
  • 27,892
  • 14
  • 72
  • 91
9
votes
3 answers

Java bug? Why extra zero byte in utf8 encoding?

The following code public class CharsetProblem { public static void main(String[] args) { //String str = "aaaaaaaaa"; String str = "aaaaaaaaaa"; Charset cs1 = Charset.forName("ASCII"); Charset cs2 = Charset.forName("utf8"); …
Dims
  • 47,675
  • 117
  • 331
  • 600