0

In one of my database table I found some corrupted words like:

Noël, japón, Świata

Which I later found should be as:

Noël, japón, świata

Anyone know how to convert them back to normal using PHP

Buntu Linux
  • 492
  • 9
  • 19
  • do you know why they have been corrupted? Maybe due to wrong charset in table or was it the php insert/update? – malta Feb 08 '14 at 13:52
  • Yes @Micallef! Most probably due to a database transfer! Now collation in the raw is set as "utf8_general_ci" But now I just want to use the data in the table and not going to insert anymore. Is there anyway to get them back to normal? – Buntu Linux Feb 08 '14 at 14:01
  • you may use `preg_replace` to replace the characters if they are consistently incorrect – malta Feb 08 '14 at 14:05
  • Well @Micallef, there are lot of characters. Hard to find which represents which. Latin to Arabic – – Buntu Linux Feb 08 '14 at 14:22
  • any chance you can redo the database transfer? – malta Feb 08 '14 at 14:53
  • Yes @Micallef, I'll give a try and tell you.. – Buntu Linux Feb 08 '14 at 15:09

2 Answers2

1

Unfortunately, it's not revertible by using php conversion. I've just made a PHP script which tries all combinations, more than once (up to 5 times) and none of them yields "japón". So it's not possible.

script:

<?php
$encodings=mb_list_encodings();
foreach($encodings as $enc_to) {
    foreach($encodings as $enc_from) {
        $str="Noël, japón, Świata";
        for ($i=0;$i<5;$i++) {
            $str=mb_convert_encoding($str,$enc_to,$enc_from);
            echo "$enc_from -> $enc_to ($i): ".$str."\n";
            echo "$enc_from -> $enc_to ($i) + html_entity_decode: ".html_entity_decode($str)."\n";
            echo "$enc_from -> $enc_to ($i) + htmlspecialchars_decode: ".htmlspecialchars_decode($str)."\n";
            echo "$enc_from -> $enc_to ($i) + urldecode: ".urldecode($str)."\n";
            echo "$enc_from -> $enc_to ($i) + htmlentities: ".htmlentities($str)."\n";
            echo "$enc_from -> $enc_to ($i) + htmlspecialchars: ".htmlspecialchars($str)."\n";
            echo "$enc_from -> $enc_to ($i) + urlencode: ".urlencode($str)."\n";
        }
    }
}

... grepping the output catches no "japón"

1

Alternatively you can check if the problem is related to the character encoding using iconv - Check the php manual

malta
  • 858
  • 1
  • 9
  • 17
  • 1
    I have don a mistake while importing, did not set the encoding while importing! `mysql -u USERNAME -pPASSWORD --default_character_set utf8 DATABASE < file.sql` Now everything is back to normal – Buntu Linux Feb 09 '14 at 04:22