Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

The character-encoding tag discusses the general concept of character-set encodings
The unicode character set can be represented in a variety of encodings, one of which is UTF-8
The ascii character set and encoding it generalizes
Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4

22178 questions

votes

4 answers

How to convert a UTF-8 string into Unicode?

I have string that displays UTF-8 encoded characters, and I want to convert it back to Unicode. For now, my implementation is the following: public static string DecodeFromUtf8(this string utf8String) { // read the string as UTF-8 bytes. …

c# string unicode utf-8

asked Jul 02 '12 at 12:47

remio

1,242
2
15
36

votes

2 answers

Character encoding with Ruby 1.9.3 and the mail gem

I'm trying to parse email strings with the Ruby mail gem, and I'm having a devil of a time with character encodings. Take the following email: MIME-Version: 1.0 Sender: foobar@example.com Received: by 10.142.239.17 with HTTP; Thu, 14 Jun 2012…

ruby email utf-8 character-encoding

asked Jun 14 '12 at 18:51

Micah

17,584
8
40
46

votes

1 answer

perl: convert a string to utf-8 for json decode

I'm crawling a website and collecting information from its JSON. The results are saved in a hash. But some of the pages give me "malformed UTF-8 character in JSON string" error. I notice that the last letter in "cafe" will produce error. I think it…

json perl utf-8

asked May 22 '12 at 18:57

Ivan Wang

8,306
14
44
56

votes

3 answers

python regular expression with utf8 issue

I got a file which includes many lines of plain utf-8 text. Such as below, by the by, it's Chinese. PROCESS：类型：关爱积分[NOTIFY] 交易号：2012022900000109 订单号：W12022910079166 交易金额：0.01元交易状态：true 2012-2-29 10:13:08 The file itself was saved in…

python regex utf-8 python-2.7

asked May 11 '12 at 06:25

castiel

2,675
5
29
38

votes

3 answers

Why can't I write Chinese characters in nodejs HTTP response?

Here is my little code: var http = require('http'); var port = 9002; var host_ip = ''; http.createServer(function (req, res) { var content = new Buffer("Hello 世界", "utf-8") console.log('request arrived'); res.writeHead(200, { …

node.js utf-8 cjk

asked May 06 '12 at 12:20

Allan Ruin

5,229
7
37
42

votes

4 answers

Why is DOCTYPE line red in firefox?

The websites I've designed had no problem before but now I see DOCTYPE line red in Firefox 11. There is no problem in validation. I changed encoding to UTF-8 without BOM but problem still…

validation firefox utf-8 doctype

asked Apr 07 '12 at 10:41

HasanG

12,734
29
100
154

votes

5 answers

"an integer is required" when open()'ing a file as utf-8?

I have a file I'm trying to open up in python with the following line: f = open("C:/data/lastfm-dataset-360k/test_data.tsv", "r", "utf-8") Calling this gives me the error TypeError: an integer is required I deleted all other code besides that one…

python utf-8

asked Apr 01 '12 at 23:29

Jim

4,509
16
50
80

votes

3 answers

How do I convert a UTF-8 string to upper case?

Is there a portable way to convert a UTF-8 string in C to upper case? If not, what is the Linux way to do it?

c utf-8

asked Mar 29 '12 at 16:03

August Karlstrom

10,773
7
38
60

votes

2 answers

Storing Chinese, Korean, English, etc in MS SQL through SQL Express

I am using MS SQL 2008 Express to connect to a shared MS SQL 2008 server where I have a database. The default collation for the DB is currently SQL_Latin1_General_CP1_CI_AS. Ultimately, I would like to store English, Korean, Chinese, and any other…

php sql-server-2008 utf-8 character-encoding collation

asked Mar 28 '12 at 00:17

gcdev

1,406
3
17
30

votes

5 answers

UTF-8 problem in python when reading chars

I'm using Python 2.5. What is going on here? What have I misunderstood? How can I fix it? in.txt: Stäckövérfløw code.py #!/usr/bin/env python # -*- coding: utf-8 -*- print """Content-Type: text/plain; charset="UTF-8"\n""" f = open('in.txt','r') for…

python utf-8

asked Jun 12 '09 at 07:39

jacob

1,214
2
13
22

votes

2 answers

Is UTF-8 the encoding of choice for QR-codes with non ASCII chars by now?

Google uses UTF-8 it as default for their very popular encoder. From what I can see they don't even add the byte order mark. The problem is that most scanners still seem to use JIS8 (QR 2000) instead of iso-8859 (QR 2005) as default, so it mostly…

encoding utf-8 character-encoding qr-code iso-8859-1

asked Mar 14 '12 at 10:00

Gonzo

2,023
3
21
30

votes

1 answer

kdiff3 doen not show uft8

I am using kdiff3 with TortoiseHg. When merging file in utf-8 encoding, kdiff3 show all non-latin text like "СЃРєР»Р°Рґ". How I can fix this?

mercurial utf-8 merge

asked Mar 13 '12 at 16:27

Andrew G

votes

2 answers

Removing invalid/incomplete multibyte characters

I'm having some issues using the following code on user input: htmlentities($string, ENT_COMPAT, 'UTF-8'); When an invalid multibyte character is detected PHP throws a notice: PHP Warning: htmlentities(): Invalid multibyte sequence in argument in…

php utf-8 iconv

asked Mar 09 '12 at 08:59

Dean

5,884
2
18
24

votes

3 answers

How to parse UTF-8 representation to String in Java?

Given the following code: String tmp = new String("\\u0068\\u0065\\u006c\\u006c\\u006f\\u000a"); String result = convertToEffectiveString(tmp); // result contain now "hello\n" Does the JDK already provide some classes for doing this ? Is there a…

java utf-8 ascii

asked Feb 15 '12 at 01:39

Stephan

41,764
65
238
329

votes

3 answers

What's a good terminator byte for UTF-8 data?

I have a need to manipulate UTF-8 byte arrays in a low-level environment. The strings will be prefix-similar and kept in a container that exploits this (a trie.) To preserve this prefix-similarity as much as possible, I'd prefer to use a…

unicode utf-8

asked Jan 18 '12 at 20:12

phs

10,687
4
58
84

Prev 1 2 3

…

99 100 Next