Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a character-encoding that can describe the set of unicode code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on linux and other recent unix-like operating systems. It was designed to be backwards-compatible with ascii while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

The character-encoding tag discusses the general concept of character-set encodings
The unicode character set can be represented in a variety of encodings, one of which is UTF-8
The ascii character set and encoding it generalizes
Other UTFs: utf-16 utf-32, rarely used: utf-7 utf-1 utf-18 utf-36 utf8mb4

22178 questions

votes

2 answers

Unicode to UTF-8

i'm using vbscript to extract data from db2 and write to file. Writing to file like: Set objTextFile = objFSO.CreateTextFile(sFilePath, True, True) that creates file in unicode. But that is xml file and it uses UTF-8. So when i open xml file with…

unicode vbscript utf-8 character-encoding

asked Nov 08 '10 at 16:26

Ruslan

votes

1 answer

UnicodeDecodeError: 'gbk' codec can't decode byte when read json contains chinese

I'm switching from Python 2 to 3 In my jupyter notebook the code is file = "./data/test.json" with open(file) as data_file: data = json.load(data_file) It used to be fine with python 2, but now after just switch to python 3, it gives me…

python json unicode utf-8

asked Dec 06 '16 at 14:23

ZK Zhao

19,885
47
132
206

votes

3 answers

Python MySQL CSV export to json strange encoding

I received a csv file exported from a MySQL database (I think the encoding is latin1 since the language is spanish). Unfortunately the encoding is wrong and I cannot process it at all. If I use file: $ file -I file.csv file.csv: text/plain;…

python mysql json csv utf-8

asked Oct 25 '16 at 09:23

alexsc

1,196
1
11
21

votes

2 answers

python decode partial utf-8 byte array

I'm getting data from channel which is not aware about UTF-8 rules. So sometimes when UTF-8 is using multiple bytes to code one character and I try to convert part of received data into text I'm getting error during conversion. By nature of…

python utf-8

asked Oct 14 '16 at 13:34

Vit Bernatik

3,566
2
34
40

votes

2 answers

How can I get accents (as tone marks) over Chinese characters in LaTeX?

Tone marks above Chinese characters in latex / Combining Accents in unicode My aim is to put tone marks above Chinese characters in latex, and google seems to not be letting on to the answer. Is it possible to use combining accents with chinese…

unicode utf-8 latex diacritics cjk

asked Oct 23 '10 at 10:29

Twig

votes

1 answer

Best practice: Should I try to change to UTF-8 as locale or is it safe to leave it as is?

I try to set my default encoding to UTF-8; up to now without success: a <- "Hallo" b <- "äöfd" print(Encoding(a)) # [1] "unknown" print(Encoding(b)) # [1] "latin1" options(encoding = "UTF-8") a <- "Hallo" b <- "äöfd" print(Encoding(a)) # [1]…

r windows encoding utf-8

asked Sep 22 '16 at 07:24

Christoph

6,841
4
37
89

votes

2 answers

Android displays text in wrong encoding after update to Java 8

I've updated my project to SDK version 24 and Java 8 and encountered a strange encoding issue. By some strange reason Android treats my hardcoded UTF-8 strings as Windows-1251 and thus the text is garbled. Like this: This is what I…

android android-studio encoding utf-8 java-8

asked Sep 15 '16 at 14:56

FelisManulus

votes

1 answer

Truncated Read With UTF-16-Encoded Text in C++

My goal is to convert external input sources to a common, UTF-8 internal encoding, since it is compatible with many libraries I use (such as RE2) and is compact. Since I do not need to do string slicing except with pure ASCII, UTF-8 is an ideal…

c++ c++11 encoding utf-8 utf-16

asked Sep 12 '16 at 00:02

Alex Huszagh

13,272
3
39
67

votes

0 answers

UTF-8 with R Markdown, knitr and Windows

What? An .Rmd file is error-free rendered via knitr (or rmarkdown) within from Linux. Related material (i.e. child R scripts and CSV input data) is all set in UTF-8. Executing the same script from within Windows (actually the script is inside a…

r windows utf-8 knitr r-markdown

asked Aug 18 '16 at 16:05

Nikos Alexandris

votes

2 answers

How to remove strange characters using gsub in R?

I'm trying to clean up some text that was loaded into memory using readLines(..., encoding='UTF-8'). If I don't specify the encoding, I see all kinds of strange characters like: > "The way I talk to my family......i would get my ass beat to >…

r unicode utf-8

asked Aug 08 '16 at 11:57

Nate Reed

6,761
12
53
67

votes

2 answers

Rails, MySQL, Unicode data and latin1 tables - Where to go from here?

I'm not 100% sure on the particulars, so I'd love someone straightening me out, but I'll forge ahead with what I think is going on... When I first setup my database, I used the default character encoding of the system without even thinking, and it…

mysql ruby-on-rails utf-8 character-encoding

asked Oct 03 '10 at 14:12

Micah

17,584
8
40
46

votes

4 answers

How do I fix invalid HTML characters in pages served with different encoding?

I have a number of websites that are rendering invalid characters. The pages' meta tags specify UTF-8 encoding. However, a number of pages contain characters that can't be interpreted by UTF-8, probably because the files were saved with another…

html utf-8 character-encoding non-unicode

asked Sep 30 '10 at 17:42

Andy

votes

3 answers

Working with files and utf8 in PHP

Lets say I have a file called foo.txt encoded in utf8: aoeu qjkx ñpyf And I want to get an array that contains all the lines in that file (one line per index) that have the letters aoeuñpyf, and only the lines with these letters. I wrote the…

php file-io unicode utf-8

asked Sep 26 '10 at 23:36

Gerardo Marset

votes

1 answer

Git: Diff does not handles character encoding other than UTF-8?

Created a repo, added UTF8 and Latin2 encoded files with this content: árvíztűrő tükörfúrógép ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP See on https://github.com/bimlas/git-test/commit/872370caf91f1faaf931c1228c797f3d10d6435d The output of git log -p 82904e60…

windows git powershell encoding utf-8

asked Apr 08 '16 at 07:33

bimlas

2,359
1
21
29

votes

2 answers

mb_strlen() is it enough?

When counting the length of an UTF-8 string in PHP I use mb_strlen(). For example: if (mb_strlen($name, 'UTF-8') < 3) { $error .= 'Name is required. Minimum of 3 characters required in name.'; } As the text fields can accept any language…

php utf-8 multilingual mbstring

asked Sep 04 '10 at 04:08

PHPLOVER

7,047
18
37
54

Prev 1 2 3

…

99 100 Next