Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
8
votes
1 answer

Pandas: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

community. I want to open a CSV using pandas and perform analysis on it. Please, help as I am not able to open the CSV itself. I tried opening it with UTF-8, Latin-1, and ISO-8859-1 encoding. It didn't work.…
Sezal Chug
  • 81
  • 1
  • 1
  • 3
8
votes
1 answer

Reading One Line from a File in UTF-16 Format

I have some files generated from a script that provide information about various computers. The txt files are in UTF-8, however, there is one line that is in UTF-16 format. How should I go about reading this line from the file? P.S. I'm trying to…
Ben Combs
  • 83
  • 3
8
votes
1 answer

WMIC command in batch outputting non UTF-8 text files

I'm using a WMIC command to output a list of SIDS and accompanying user profile names to text. From the text, I can edit a list of SIDS I need to add a set of registry keys to. However, the script that loops through the edited text file of SIDS is…
Tika9o9
  • 405
  • 4
  • 22
8
votes
5 answers

Select MySQL rows with Japanese characters

Would anyone know of a reliable method (with mySQL or otherwise) to select rows in a database that contain Japanese characters? I have a lot of rows in my database, some of which only have alphanumeric characters, some of which have Japanese…
Rio
  • 14,182
  • 21
  • 67
  • 107
8
votes
3 answers

Is it true that string literals in PHP can only be encoded in an encoding which is a compatible superset of ASCII, such as UTF-8 or ISO-8859-1?

I come across following text from the Details of the String Type page from PHP Manual : Given that PHP does not dictate a specific encoding for strings, one might wonder how string literals are encoded. String will be encoded in whatever …
PHPLover
  • 1
  • 51
  • 158
  • 311
8
votes
6 answers

String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?
Mariusz
  • 1,825
  • 5
  • 22
  • 36
8
votes
2 answers

How to Convert UTF8 ArrayBuffer to UTF16 JavaScript String

The answers from here got me started on how to use the ArrayBuffer: Converting between strings and ArrayBuffers However, they have quite a bit of different approaches. The main one is this: function ab2str(buf) { return…
Lance
  • 75,200
  • 93
  • 289
  • 503
8
votes
3 answers

Decode or unescape \u00f0\u009f\u0091\u008d to

We all know UTF-8 is hard. I exported my messages from Facebook and the resulting JSON file escaped all non-ascii characters to unicode code points. I am looking for an easy way to unescape these unicode code points to regular old UTF-8. I also…
Dennis G
  • 21,405
  • 19
  • 96
  • 133
8
votes
5 answers

Django MySQL 'utf8' is currently an alias for the character set UTF8MB3, which will be replaced by UTF8MB4

I am using Django 2.0.4, MySQL 8.0.11, mysqlclient-1.3.12 and Python 3.6.5 on Mac Sierra. I am receiving the following warning: /lib/python3.6/site-packages/django/db/backends/mysql/base.py:71: Warning: (3719, "'utf8' is currently an alias for the…
Lehrian
  • 347
  • 1
  • 2
  • 9
8
votes
4 answers

Chrome says my content script isn't UTF-8

Receiving the error Could not load file 'worker.js' for content script. It isn't UTF-8 encoded. > file -I chrome/worker.js chrome/worker.js: text/plain; charset=utf-8 With to-utf8-unix > to-utf8-unix chrome/worker.js …
8
votes
5 answers

Writing utf-8 string inside my python files

This line in my .py file is giving me a: "UnicodeDecodeError: 'utf8' codec can't decode bytes in position 8-13: unsupported Unicode code range" if line.startswith(u"Fußnote"): The file is saved in utf-8 and has the encoding at the top: # -- coding:…
user317033
8
votes
4 answers

How to convert String with “ (ISO-8859-1) characters to normal (UTF-8)characters?

  • Jain R.K. and Iyengar S.R.K., “Advanced Engineering Mathematicsâ€, Narosa Publications,
  • i have lot a raw html string in database. all the text have these weird characters. how can i convert to normal text for saving back it back in…
    muthukrishnan
    • 275
    • 1
    • 4
    • 17
    8
    votes
    1 answer

    pandoc complains about utf-8 decoding error even if my file is valid utf-8 encoded file

    I am trying to convert a markdown file to pdf using pandoc on Windows system. Since my markdown contains Chinese characters, I use the following command to produce the pdf: pandoc --pdf-engine=xelatex -V CJKmainfont=KaiTi test.md -o test.pdfbut But…
    jdhao
    • 24,001
    • 18
    • 134
    • 273
    8
    votes
    4 answers

    Disposition of HTML-Select having UTF-Icons

    Having this HTML The secodn input is dispositioned. How to avoid this effect?
    Grim
    • 1,938
    • 10
    • 56
    • 123
    8
    votes
    4 answers

    Python read csv with Hebrew header

    I tried to use dataset=pandas.read_csv('filename') to make a framework. But somehow I can't do it because one of the column headers is written in Hebrew. I checked, and it is possible for a DataFrame to have a Hebrew word as column header. …
    Matan
    • 322
    • 1
    • 4
    • 14