0

I'm losing accented characters.

From PHP I download an xml file which uses UTF8, while my PHP script uses Latin1. I can't manage to convert the UTF8 into Latin1.

I've tried this:

$meta=mb_convert_encoding($meta,'CP1252','UTF-8');

and

$meta=mb_convert_encoding($meta,'UTF-8');
$meta=mb_convert_encoding($meta,'CP1252','UTF-8');

But either way the accented characters are broken and turned into 2 characters.

Input:

<title>First book of zoölogy</title>

Output:

<title>First book of zoo?logy</title>

I figured it out myself, see my answer below. Thank you everyone for your help!

Alasdair
  • 13,348
  • 18
  • 82
  • 138

3 Answers3

1

Change the collation of the tables do Utf8_general_ci and before conections to the database use:

mysql_set_charset("utf8");

I think this can solve your problem.

  • Everything is set to Latin1 because this is what I want to use, not UTF8. I already set the mysql_set_charset to Latin1. The accented characters should be able to be converted from UTF8 into Latin1 right at the beginning, which they are, after that no UTF8 is used, but somewhere the chars get corrupted. – Alasdair Mar 19 '12 at 03:50
  • 3
    @Alasdair. Why do you want to continue using Latin1 when utf8 offers so much more.. ? – Ben Mar 19 '12 at 04:07
  • Because it uses less bytes and I don't need UTF8. – Alasdair Mar 19 '12 at 04:08
  • 1
    @Alasdair. Less bytes isn't necessarily true, utf8 is a variable width encoding. The basic subset won't take much more space than latin1. And you may not need utf8 now but you'll be happy to have chosen it when you need to expand whatever you're doing to international character set.. – Ben Mar 19 '12 at 04:12
  • This will never expand beyond the Latin alphabet, and the reason for using Latin1 specifically is so that I can have a searchable index on 255 chars on the MySQL database. – Alasdair Mar 19 '12 at 04:27
  • @Alasdair. A utf8 _varchar_ column of 255 length will allow you to store 255 utf8 _characters_ and create an index on it.. (but not true for a _char_ column) – Ben Mar 19 '12 at 05:08
1

This fixed it:

$meta=iconv('UTF-8','CP1252//TRANSLIT',$meta);

I didn't know about iconv before, I thought there was only mb_strings to work with, but iconv works very well.

Alasdair
  • 13,348
  • 18
  • 82
  • 138
0

Maybe default charset of your MySQL server is UTF-8. Try this:
Insert the following query after your MySQL connection details:

mysql_query("SET NAMES latin1");
Valeh Hajiyev
  • 3,216
  • 4
  • 19
  • 28
  • It's not, the default is latin1, and it seems now that this problem is occurring before the string goes into the database, so it's a PHP problem and not a MySQL problem. – Alasdair Mar 19 '12 at 04:20