Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

The character-encoding tag discusses the general concept of character-set encodings
The unicode character set can be represented in a variety of encodings, one of which is UTF-8
The ascii character set and encoding it generalizes
Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4

22178 questions

votes

1 answer

Pandas: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

community. I want to open a CSV using pandas and perform analysis on it. Please, help as I am not able to open the CSV itself. I tried opening it with UTF-8, Latin-1, and ISO-8859-1 encoding. It didn't work.…

pandas csv unicode utf-8 codec

asked Apr 17 '20 at 05:36

Sezal Chug

votes

1 answer

Reading One Line from a File in UTF-16 Format

I have some files generated from a script that provide information about various computers. The txt files are in UTF-8, however, there is one line that is in UTF-16 format. How should I go about reading this line from the file? P.S. I'm trying to…

java utf-8

asked Jun 07 '19 at 15:38

Ben Combs

votes

1 answer

WMIC command in batch outputting non UTF-8 text files

I'm using a WMIC command to output a list of SIDS and accompanying user profile names to text. From the text, I can edit a list of SIDS I need to add a set of registry keys to. However, the script that loops through the edited text file of SIDS is…

windows batch-file cmd utf-8 wmic

asked Mar 23 '19 at 04:18

Tika9o9

votes

5 answers

Select MySQL rows with Japanese characters

Would anyone know of a reliable method (with mySQL or otherwise) to select rows in a database that contain Japanese characters? I have a lot of rows in my database, some of which only have alphanumeric characters, some of which have Japanese…

mysql utf-8 phpmyadmin

asked Mar 19 '11 at 06:19

Rio

14,182
21
67
107

votes

3 answers

Is it true that string literals in PHP can only be encoded in an encoding which is a compatible superset of ASCII, such as UTF-8 or ISO-8859-1?

I come across following text from the Details of the String Type page from PHP Manual : Given that PHP does not dictate a specific encoding for strings, one might wonder how string literals are encoded. String will be encoded in whatever …

php encoding utf-8 ascii non-ascii-characters

asked Sep 23 '18 at 16:00

PHPLover

votes

6 answers

String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?

utf-8 lazarus freepascal

asked Mar 08 '11 at 14:01

Mariusz

1,825
5
22
36

votes

2 answers

How to Convert UTF8 ArrayBuffer to UTF16 JavaScript String

The answers from here got me started on how to use the ArrayBuffer: Converting between strings and ArrayBuffers However, they have quite a bit of different approaches. The main one is this: function ab2str(buf) { return…

javascript encoding utf-8 arraybuffer

asked Jul 24 '18 at 21:12

Lance

75,200
93
289
503

votes

3 answers

Decode or unescape \u00f0\u009f\u0091\u008d to

We all know UTF-8 is hard. I exported my messages from Facebook and the resulting JSON file escaped all non-ascii characters to unicode code points. I am looking for an easy way to unescape these unicode code points to regular old UTF-8. I also…

json facebook powershell utf-8 facebook-messenger

asked Jun 12 '18 at 22:38

Dennis G

21,405
19
96
133

votes

5 answers

Django MySQL 'utf8' is currently an alias for the character set UTF8MB3, which will be replaced by UTF8MB4

I am using Django 2.0.4, MySQL 8.0.11, mysqlclient-1.3.12 and Python 3.6.5 on Mac Sierra. I am receiving the following warning: /lib/python3.6/site-packages/django/db/backends/mysql/base.py:71: Warning: (3719, "'utf8' is currently an alias for the…

mysql django utf-8 utf8mb4

asked Apr 25 '18 at 22:27

Lehrian

votes

4 answers

Chrome says my content script isn't UTF-8

Receiving the error Could not load file 'worker.js' for content script. It isn't UTF-8 encoded. > file -I chrome/worker.js chrome/worker.js: text/plain; charset=utf-8 With to-utf8-unix > to-utf8-unix chrome/worker.js …

google-chrome-extension utf-8 character-encoding clojurescript

asked Apr 23 '18 at 10:56

patchrail

2,007
1
23
33

votes

5 answers

Writing utf-8 string inside my python files

This line in my .py file is giving me a: "UnicodeDecodeError: 'utf8' codec can't decode bytes in position 8-13: unsupported Unicode code range" if line.startswith(u"Fußnote"): The file is saved in utf-8 and has the encoding at the top: # -- coding:…

python unicode utf-8

asked Jan 27 '11 at 02:43

user317033

votes

4 answers

How to convert String with â€œ (ISO-8859-1) characters to normal (UTF-8)characters?

Jain R.K. and Iyengar S.R.K., â€œAdvanced Engineering Mathematicsâ€, Narosa Publications,

i have lot a raw html string in database. all the text have these weird characters. how can i convert to normal text for saving back it back in…

php mysql utf-8 character-encoding iso-8859-1

asked Dec 31 '17 at 19:43

muthukrishnan

votes

1 answer

pandoc complains about utf-8 decoding error even if my file is valid utf-8 encoded file

I am trying to convert a markdown file to pdf using pandoc on Windows system. Since my markdown contains Chinese characters, I use the following command to produce the pdf: pandoc --pdf-engine=xelatex -V CJKmainfont=KaiTi test.md -o test.pdfbut But…

bash unicode utf-8 character-encoding pandoc

asked Dec 23 '17 at 17:45

jdhao

24,001
18
134
273

votes

4 answers

Disposition of HTML-Select having UTF-Icons

Having this HTML The secodn input is dispositioned. How to avoid this effect?

html google-chrome utf-8 symbols

asked Dec 02 '17 at 12:56

Grim

1,938
10
56
123

votes

4 answers

Python read csv with Hebrew header

I tried to use dataset=pandas.read_csv('filename') to make a framework. But somehow I can't do it because one of the column headers is written in Hebrew. I checked, and it is possible for a DataFrame to have a Hebrew word as column header. …

python pandas csv utf-8 hebrew

asked Nov 20 '17 at 14:03

Matan

Prev 1 2 3

…

99 100 Next