Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
8
votes
2 answers

Unicode to UTF-8

i'm using vbscript to extract data from db2 and write to file. Writing to file like: Set objTextFile = objFSO.CreateTextFile(sFilePath, True, True) that creates file in unicode. But that is xml file and it uses UTF-8. So when i open xml file with…
Ruslan
  • 319
  • 1
  • 8
  • 17
8
votes
1 answer

UnicodeDecodeError: 'gbk' codec can't decode byte when read json contains chinese

I'm switching from Python 2 to 3 In my jupyter notebook the code is file = "./data/test.json" with open(file) as data_file: data = json.load(data_file) It used to be fine with python 2, but now after just switch to python 3, it gives me…
ZK Zhao
  • 19,885
  • 47
  • 132
  • 206
8
votes
3 answers

Python MySQL CSV export to json strange encoding

I received a csv file exported from a MySQL database (I think the encoding is latin1 since the language is spanish). Unfortunately the encoding is wrong and I cannot process it at all. If I use file: $ file -I file.csv file.csv: text/plain;…
alexsc
  • 1,196
  • 1
  • 11
  • 21
8
votes
2 answers

python decode partial utf-8 byte array

I'm getting data from channel which is not aware about UTF-8 rules. So sometimes when UTF-8 is using multiple bytes to code one character and I try to convert part of received data into text I'm getting error during conversion. By nature of…
Vit Bernatik
  • 3,566
  • 2
  • 34
  • 40
8
votes
2 answers

How can I get accents (as tone marks) over Chinese characters in LaTeX?

Tone marks above Chinese characters in latex / Combining Accents in unicode My aim is to put tone marks above Chinese characters in latex, and google seems to not be letting on to the answer. Is it possible to use combining accents with chinese…
Twig
  • 621
  • 5
  • 17
8
votes
1 answer

Best practice: Should I try to change to UTF-8 as locale or is it safe to leave it as is?

I try to set my default encoding to UTF-8; up to now without success: a <- "Hallo" b <- "äöfd" print(Encoding(a)) # [1] "unknown" print(Encoding(b)) # [1] "latin1" options(encoding = "UTF-8") a <- "Hallo" b <- "äöfd" print(Encoding(a)) # [1]…
Christoph
  • 6,841
  • 4
  • 37
  • 89
8
votes
2 answers

Android displays text in wrong encoding after update to Java 8

I've updated my project to SDK version 24 and Java 8 and encountered a strange encoding issue. By some strange reason Android treats my hardcoded UTF-8 strings as Windows-1251 and thus the text is garbled. Like this: This is what I…
FelisManulus
  • 440
  • 4
  • 18
8
votes
1 answer

Truncated Read With UTF-16-Encoded Text in C++

My goal is to convert external input sources to a common, UTF-8 internal encoding, since it is compatible with many libraries I use (such as RE2) and is compact. Since I do not need to do string slicing except with pure ASCII, UTF-8 is an ideal…
Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
8
votes
0 answers

UTF-8 with R Markdown, knitr and Windows

What? An .Rmd file is error-free rendered via knitr (or rmarkdown) within from Linux. Related material (i.e. child R scripts and CSV input data) is all set in UTF-8. Executing the same script from within Windows (actually the script is inside a…
Nikos Alexandris
  • 708
  • 2
  • 22
  • 36
8
votes
2 answers

How to remove strange characters using gsub in R?

I'm trying to clean up some text that was loaded into memory using readLines(..., encoding='UTF-8'). If I don't specify the encoding, I see all kinds of strange characters like: > "The way I talk to my family......i would get my ass beat to >…
Nate Reed
  • 6,761
  • 12
  • 53
  • 67
8
votes
2 answers

Rails, MySQL, Unicode data and latin1 tables - Where to go from here?

I'm not 100% sure on the particulars, so I'd love someone straightening me out, but I'll forge ahead with what I think is going on... When I first setup my database, I used the default character encoding of the system without even thinking, and it…
Micah
  • 17,584
  • 8
  • 40
  • 46
8
votes
4 answers

How do I fix invalid HTML characters in pages served with different encoding?

I have a number of websites that are rendering invalid characters. The pages' meta tags specify UTF-8 encoding. However, a number of pages contain characters that can't be interpreted by UTF-8, probably because the files were saved with another…
Andy
  • 856
  • 9
  • 26
8
votes
3 answers

Working with files and utf8 in PHP

Lets say I have a file called foo.txt encoded in utf8: aoeu qjkx ñpyf And I want to get an array that contains all the lines in that file (one line per index) that have the letters aoeuñpyf, and only the lines with these letters. I wrote the…
Gerardo Marset
  • 803
  • 1
  • 10
  • 23
8
votes
1 answer

Git: Diff does not handles character encoding other than UTF-8?

Created a repo, added UTF8 and Latin2 encoded files with this content: árvíztűrő tükörfúrógép ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP See on https://github.com/bimlas/git-test/commit/872370caf91f1faaf931c1228c797f3d10d6435d The output of git log -p 82904e60…
bimlas
  • 2,359
  • 1
  • 21
  • 29
8
votes
2 answers

mb_strlen() is it enough?

When counting the length of an UTF-8 string in PHP I use mb_strlen(). For example: if (mb_strlen($name, 'UTF-8') < 3) { $error .= 'Name is required. Minimum of 3 characters required in name.'; } As the text fields can accept any language…
PHPLOVER
  • 7,047
  • 18
  • 37
  • 54