0

I have the following issue with character encoding witch I don't know how to solve it. I'm building a website where it is necessary to grab feeds from custom news sites and store it in my own database.

The thing is that for some feeds are stored fine with the german umlauts as they are (ä,ü,ß). But for other feeds the german umlauts are converted to "Java für Mac" or "Fehler in CoreText lässt OS-X- und iOS-Apps abstürzen".

The database setting is utf8_general_ci, also when i save some field directly in the database the german umlauts are saved ok.

For loading the feeds i use the simplepie 1.3 library and have set also the input-output character encoding to UTF-8.

Tudor Ravoiu
  • 2,130
  • 8
  • 35
  • 56
  • 1
    [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) - If the data is coming from different sources, apparently in different encodings, you need to parse their meta data (HTTP headers perhaps) and convert them accordingly. – deceze Sep 03 '13 at 05:48
  • Apart from what @deceze commented (which is a good read btw.), there can be also an additional issue: As encoding is hard to understand for many programmers and webauthors, who says that it's not the feed itself which contains the wrong data? – hakre Sep 03 '13 at 06:00

1 Answers1

0

You have probably no charset selected in your html document.

tea2code
  • 1,007
  • 1
  • 8
  • 15
  • I've set header('Content-Type: text/html; charset=utf-8'); right in the beginning of the document, still not working – Tudor Ravoiu Sep 03 '13 at 06:35
  • The html code itself needs a [charset option](http://www.w3schools.com/tags/att_meta_charset.asp): or for HTML5 – tea2code Sep 03 '13 at 07:25
  • @tea It's a good idea to include an HTML charset meta tag, but it's *not needed*. The HTTP header always takes precedence and should always be set. Further the OP says that it does work for some sources and not for others. – deceze Sep 03 '13 at 07:45
  • Everyday something new to learn. Thx deceze. Another problem could be a messed up template with wrong encoding. Seems like the BOM can take precedence over the header: http://stackoverflow.com/questions/7102925/prefer-charset-declaration-in-html-meta-tag-or-http-header – tea2code Sep 03 '13 at 08:11