0

I'm going through some feeds of spanish blogs and saving them in a database. For example, the word: Diseño! , I can see it correctly in the script that gets the feeds but when it saves in the database it saves like Diseñó! . I have my database set to utf8 . I think i've followed every single question that looks like this but nothing fix it. I have changed the charset in my html from utf8 to iso-8859 but still, i can see it correctly in the html but not once it saves to the database. Anybody has a solution? Thanks!

raygo
  • 1,348
  • 5
  • 18
  • 40

3 Answers3

0

At a guess, the page that is the source of the "Diseño!" is probably encoded in iso-8859-1 or windows-1252, and it's being stored in your database without any conversion.

If that's the case, you need to translate the string from the encoding it's in to utf-8, using something like http://php.net/manual/en/function.mb-convert-encoding.php.

Brenton Fletcher
  • 1,280
  • 10
  • 15
  • If the source documents are encoded in utf-8, and the content is *actually* encoded in utf-8 (pages lie about their encoding more often than you'd think), and your database encoding is utf-8, then something is modifying the strings along the way. In that case we'd need to see your code, at least the relevant parts, to be able to help solve the problem. – Brenton Fletcher Oct 18 '12 at 03:53
  • well. Im just using simplepie to get the feeds and then i'm submitting them to the database, nothing fancy. If you want I can paste the code but its pretty much the demo from simplepie. – raygo Oct 18 '12 at 04:00
  • Could you possibly link to one of the blog feeds that are problematic? – Brenton Fletcher Oct 18 '12 at 04:01
  • for example http://conojoscuriososenrepdom.blogspot.com/2012/10/las-mariposas-monarca-son-conocidas-por.html doesnt show the tilde in the title – raygo Oct 18 '12 at 04:04
  • For the specific URL you gave, I can't see any issues, either in the feed XML, or, using the SimplePie demo directly. – Brenton Fletcher Oct 18 '12 at 04:10
0

Are you using SimplePie_Cache_MySQL? It seems to have a bug, it doesn't set the encoding for the database connection. This means that the connection encodes data silently from latin1 (the default) to utf8 even when the data is already in utf8.

To fix this, add encoding=utf8 to the connection parameters in SimplePie_Cache_MySQL.php.

You have another problem too: when you read the database later you are not setting the page encoding to utf-8. This means that correctly encoded data is shown munged.

Update: On a closer look, it seems that SimplePie_Cache_MySQL is OK, the problem must be elsewhere.

Joni
  • 108,737
  • 14
  • 143
  • 193
-1

You should set the character encoding before establishing the connection to the DB.

You can use this:

mysql_set_charset( 'utf-8' );

You can also check this in php.net and follow the recommended links for alternatives like mysqli or pdo.

Bye

PatomaS
  • 1,603
  • 18
  • 25