Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

The character-encoding tag discusses the general concept of character-set encodings
The unicode character set can be represented in a variety of encodings, one of which is UTF-8
The ascii character set and encoding it generalizes
Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4

22178 questions

408

votes

18 answers

Setting the default Java character encoding

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically? I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that luxury for reasons I wont get into. I have…

java utf-8 character-encoding

asked Dec 12 '08 at 05:31

Scott T

394

votes

5 answers

Url decode UTF-8 in Python

In Python 2.7, given a URL like example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0, how can I decode it to the expected result, example.com?title==правовая+защита? I tried…

python encoding utf-8 urldecode

asked May 15 '13 at 13:16

swordholder

4,519
3
18
14

375

votes

14 answers

How to get UTF-8 working in Java webapps?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases. My setup is the following: Development environment: Windows…

java mysql tomcat encoding utf-8

asked Sep 26 '08 at 11:48

kosoant

11,619
7
31
37

364

votes

19 answers

Using PowerShell to write a file in UTF-8 without the BOM

Out-File seems to force the BOM when using UTF-8: $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "UTF8" $MyPath How can I write a file in UTF-8 with no BOM using PowerShell? Update 2021 PowerShell has changed a bit since I wrote this…

encoding powershell utf-8 byte-order-mark

asked Apr 08 '11 at 15:02

sourcenouveau

29,356
35
146
243

356

votes

16 answers

How to remove \xa0 from string in Python?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into…

python python-2.7 unicode beautifulsoup utf-8

asked Jun 12 '12 at 09:12

zhuyxn

6,671
9
38
44

354

votes

20 answers

Error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools An error occurred when compiling "process.py" on the above site. python tools/process.py --input_dir data --operation resize --output_dir data2/resize data/0.jpg ->…

python python-3.x utf-8

asked Feb 20 '17 at 08:43

pie

3,849
2
11
9

331

votes

26 answers

Detect encoding and make everything UTF-8

I'm reading out lots of texts from various RSS feeds and inserting them into my database. Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO 8859-1. Unfortunately, there are sometimes problems with the…

php encoding utf-8 character-encoding

asked May 26 '09 at 13:50

caw

30,999
61
181
291

317

votes

23 answers

"Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?

This is how my connection is set: Connection conn = DriverManager.getConnection(url + dbName + "?useUnicode=true&characterEncoding=utf-8", userName, password); And I'm getting the following error when tyring to add a row to a table: Incorrect string…

mysql jdbc utf-8 utf8mb4

asked Jun 08 '12 at 23:46

Lior

5,454
8
30
38

310

votes

12 answers

How do I check if a string is unicode or ascii?

What do I have to do in Python to figure out which encoding a string has?

python unicode encoding utf-8

asked Feb 13 '11 at 22:27

TIMEX

259,804
351
777
1,080

305

votes

6 answers

u'\ufeff' in Python string

I got an error with the following exception message: UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 155: ordinal not in range(128) Not sure what u'\ufeff' is, it shows up when I'm web scraping. How can I remedy the…

python unicode utf-8

asked Jul 28 '13 at 20:02

James Hallen

4,534
4
23
28

296

votes

17 answers

How to use UTF-8 in resource properties with ResourceBundle

I need to use UTF-8 in my resource properties using Java's ResourceBundle. When I enter the text directly into the properties file, it displays as mojibake. My app runs on Google App Engine. Can anyone give me an example? I can't get this work.

java google-app-engine utf-8 internationalization resourcebundle

asked Jan 11 '11 at 16:27

nacho

2,961
3
15
3

295

votes

5 answers

UTF-8: General? Bin? Unicode?

I'm trying to figure out what collation I should be using for various types of data. 100% of the content I will be storing is user-submitted. My understanding is that I should be using UTF-8 General CI (Case-Insensitive) instead of UTF-8 Binary.…

mysql utf-8 collation

asked Feb 26 '10 at 19:03

Dolph

49,714
13
63
88

273

votes

11 answers

UTF-8 byte[] to String

Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing…

java utf-8

asked Dec 14 '11 at 21:46

skeryl

5,225
4
26
28

260

votes

11 answers

PHP DOMDocument loadHTML not encoding UTF-8 correctly

I'm trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me). $profile = "

various japanese characters

"; $dom = new DOMDocument(); $dom->loadHTML($profile);…

php utf-8 character-encoding

asked Nov 21 '11 at 20:37

Slightly A.

2,795
2
16
10

247

votes

8 answers

Write to UTF-8 file in Python

I'm really confused with the codecs.open function. When I do: file = codecs.open("temp", "w", "utf-8") file.write(codecs.BOM_UTF8) file.close() It gives me the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal…

python utf-8 character-encoding byte-order-mark

asked Jun 01 '09 at 09:42

John Jiang

11,069
12
51
60

Prev 1

…

99 100 Next