Questions tagged [mojibake]

Garbled text that is the result of bytes being decoded using an incorrect coding.

Mojibake is the phenomenon which occurs when text is decoded from a byte stream using the wrong character encoding, resulting in a sequence of characters which is unreadable. The term "mojibake" is derived from Japanese where it literally means "unintelligible sequence of characters".

Example mojibake:

ï»¿Ø§Ù"Ø¥Ø¹Ù"Ø§Ù† Ø§Ù"Ø¹Ø§Ù"Ù

References:

Wikipedia - Mojibake

150 questions

votes

3 answers

Why is text in Swedish from a resource bundle showing up as gibberish?

Possible Duplicate: How to use UTF-8 in resource properties with ResourceBundle I want to allow internationalization to my Java Swing application. I use a bundle file to keep all labels inside it. As a test I tried to set a Swedish title to a…

asked Dec 13 '10 at 12:13

Brad

4,457
10
56
93

votes

1 answer

What does this mojibake/krakozyabry on The Simpsons say?

On Season 12 Episode 07 "The Great Money Caper" of The Simpsons, I noticed a few years ago "gibberish" signs on the Russian spaceship. Randomly today, I decided to search and see if anyone decoded them but couldn't find any results. I suspect that…

character-encoding mojibake

asked Jul 13 '12 at 20:52

chfoo

votes

2 answers

python replace unicode characters

I wrote a program to read in Windows DNS debugging log, but inside always got some funny characters in the domain field. Below is one of the example: (13)\xc2\xb5\xc2\xb1\xc2\xbe\xc3\xa2p\xc3\xb4\xc2\x8d(5)example(3)com(0)' I want to replace all…

python mojibake

asked Sep 28 '16 at 15:25

kenneth171

votes

1 answer

Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)

I am currently using python 2.7 and doing web scraping on a Chinese website. How to convert unicode below into a string? Simple str() function does not work and states UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11:…

python python-2.7 unicode encoding mojibake

asked Nov 14 '16 at 21:40

Perry Zhuang

votes

1 answer

Russian symbols in Python output corrupted (ENCODING)

I parsed a HTML document and have Russian text in it. When I'm trying to print it in Python, I get this: ÐÐ»ÑÐ±Ð½Ð¸ÑÐ½ÑÐ¹ Ð½Ð¾Ð²Ð¾Ð³Ð¾Ð´Ð½Ð¸Ð¹ Ð¿ÑÐ½Ñ I tried to decode it and I get ISO-8859-1 encoding. I'm trying to decode it like that: print…

python encoding utf-8 cyrillic mojibake

asked Nov 11 '14 at 16:42

aaaapppp

votes

1 answer

Unbaking mojibake

When you have incorrectly decoded characters, how can you identify likely candidates for the original string? Ä×èÈÄÄî▒è¤ô_üiâAâjâüâpâXüj_10òb.png I know for a fact that this image filename should have been some Japanese characters. But with…

python unicode character-encoding decoding mojibake

asked Jun 10 '14 at 12:01

wim

338,267
99
616
750

votes

1 answer

Hebrew text in vba code doesn't decode properly

I've developed a workbook, with some underlying vba code. The workbook is in Hebrew, and the vba code uses Hebrew as well, e.g. comparing strings in Hebrew, or accessing Sheets using their Hebrew names. I've developed this workbook in Excel 2010,…

excel vba encoding hebrew mojibake

asked Jan 21 '14 at 22:13

Matan_ma

votes

2 answers

Extract text from corrupt (?) pdf document

In a project I'm working on we scrape legal documents from various government sites and then make them searchable online. Every now and then we encounter a PDF that seems to be corrupt. Here's an example of one. If you open it in a PDF reader, it…

pdf corruption mojibake

asked Feb 10 '12 at 07:25

mlissner

17,359
18
106
169

votes

1 answer

Character Encoding and the â€™ Issue

Even today, one frequently sees character encoding problems with significant frequency. Take for example this recent job post: (Note: This is an example, not a spam job post... :-) I have recently seen that exact error on websites, in popular IM…

character-encoding cross-platform mojibake

asked Dec 07 '11 at 19:30

Eric J.

147,927
63
340
553

votes

5 answers

Unexpected output of std::wcout << L"élève"; in Windows Shell

While testing some functions to convert strings between wchar_t and utf8 I met the following weird result with Visual C++ express 2008 std::wcout << L"élève" << std::endl; prints out "ÚlÞve:" which is obviously not what is expected. This is…

c++ unicode wchar-t mojibake

asked Apr 06 '09 at 13:07

chmike

20,922
21
83
106

votes

4 answers

How to identify likely broken pdf pages before extracting its text?

TL;DR My workflow: Download PDF Split it into pages using pdftk Extract text of each page using pdftotext Classify text and add metadata Send it to client in a structured format I need to extract consistent text to jump from 3 to 4. If text is…

python bash pdf mojibake

asked Jul 10 '21 at 15:44

Kfcaio

votes

1 answer

python unicode: when written to file, writes in different format

I am using Python 3.4, to write a unicode string to a file. After the file is written, if I open and see, it is totally a different set of characters. CODE:- # -*- coding: utf-8 -*- with open('test.txt', 'w', encoding='utf-8') as f: name =…

python python-3.x unicode utf-8 mojibake

asked Sep 05 '15 at 12:23

Remis Haroon - رامز

3,304
4
34
62

votes

2 answers

How do I transform "Ð¢ÐµÑ" (it is russian word) into something readable?

I got MySQL DB which contains UTF8 column with such "Ð¢ÐµÑ" records. PHP's mb_detect_encoding() told me that this is UTF-8. How can I transform this "horror" into something readable? Thank you

php mysql encoding character-encoding mojibake

asked Jul 07 '10 at 13:38

Kirzilla

16,368
26
84
129

votes

7 answers

Pound symbol not displaying on web page

I have a mysql database table to store country name and currency symbol - the CHARSET has correctly set to UTF8. This is example data inserted into the table insert into country ( country_name, currency_name, currency_code, currency_symbol) values…

mysql html internationalization special-characters mojibake

asked Jun 02 '10 at 16:15

Gublooo

2,550
8
54
91

votes

2 answers

Identify garbage unicode string using python

My script is reads data from csv file, the csv file can have multiple strings of English or non English words. Some time the text file has garbage strings , i want to identify those string and skip those string and process others doc =…

python python-2.7 python-unicode mojibake

asked Mar 16 '15 at 07:58

Shashi

2,137
3
22
37

Prev 1

…

9 10 Next