Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
408
votes
18 answers

Setting the default Java character encoding

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically? I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don't have that luxury for reasons I wont get into. I have…
Scott T
394
votes
5 answers

Url decode UTF-8 in Python

In Python 2.7, given a URL like example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0, how can I decode it to the expected result, example.com?title==правовая+защита? I tried…
swordholder
  • 4,519
  • 3
  • 18
  • 14
375
votes
14 answers

How to get UTF-8 working in Java webapps?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases. My setup is the following: Development environment: Windows…
kosoant
  • 11,619
  • 7
  • 31
  • 37
364
votes
19 answers

Using PowerShell to write a file in UTF-8 without the BOM

Out-File seems to force the BOM when using UTF-8: $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding "UTF8" $MyPath How can I write a file in UTF-8 with no BOM using PowerShell? Update 2021 PowerShell has changed a bit since I wrote this…
sourcenouveau
  • 29,356
  • 35
  • 146
  • 243
356
votes
16 answers

How to remove \xa0 from string in Python?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into…
zhuyxn
  • 6,671
  • 9
  • 38
  • 44
354
votes
20 answers

Error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools An error occurred when compiling "process.py" on the above site. python tools/process.py --input_dir data --operation resize --output_dir data2/resize data/0.jpg ->…
pie
  • 3,849
  • 2
  • 11
  • 9
331
votes
26 answers

Detect encoding and make everything UTF-8

I'm reading out lots of texts from various RSS feeds and inserting them into my database. Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO 8859-1. Unfortunately, there are sometimes problems with the…
caw
  • 30,999
  • 61
  • 181
  • 291
317
votes
23 answers

"Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?

This is how my connection is set: Connection conn = DriverManager.getConnection(url + dbName + "?useUnicode=true&characterEncoding=utf-8", userName, password); And I'm getting the following error when tyring to add a row to a table: Incorrect string…
Lior
  • 5,454
  • 8
  • 30
  • 38
310
votes
12 answers

How do I check if a string is unicode or ascii?

What do I have to do in Python to figure out which encoding a string has?
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080
305
votes
6 answers

u'\ufeff' in Python string

I got an error with the following exception message: UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 155: ordinal not in range(128) Not sure what u'\ufeff' is, it shows up when I'm web scraping. How can I remedy the…
James Hallen
  • 4,534
  • 4
  • 23
  • 28
296
votes
17 answers

How to use UTF-8 in resource properties with ResourceBundle

I need to use UTF-8 in my resource properties using Java's ResourceBundle. When I enter the text directly into the properties file, it displays as mojibake. My app runs on Google App Engine. Can anyone give me an example? I can't get this work.
nacho
  • 2,961
  • 3
  • 15
  • 3
295
votes
5 answers

UTF-8: General? Bin? Unicode?

I'm trying to figure out what collation I should be using for various types of data. 100% of the content I will be storing is user-submitted. My understanding is that I should be using UTF-8 General CI (Case-Insensitive) instead of UTF-8 Binary.…
Dolph
  • 49,714
  • 13
  • 63
  • 88
273
votes
11 answers

UTF-8 byte[] to String

Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing…
skeryl
  • 5,225
  • 4
  • 26
  • 28
260
votes
11 answers

PHP DOMDocument loadHTML not encoding UTF-8 correctly

I'm trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me). $profile = "

various japanese characters

"; $dom = new DOMDocument(); $dom->loadHTML($profile);…
Slightly A.
  • 2,795
  • 2
  • 16
  • 10
247
votes
8 answers

Write to UTF-8 file in Python

I'm really confused with the codecs.open function. When I do: file = codecs.open("temp", "w", "utf-8") file.write(codecs.BOM_UTF8) file.close() It gives me the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal…
John Jiang
  • 11,069
  • 12
  • 51
  • 60