2

I am having trouble with utf8_encode() function.

Here's an example

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <?php
    header("Content-Type: text/html; charset=utf-8");
    $str = "Şşİğ";
    echo utf8_encode($str);
    ?>

the output i see is

SsIg (third one is a capital i)

if i don't use utf8_encode() this is what i get

ÅÅİÄ

So, this doesn't really work for some languages. It only makes it a bit sense instead of making it right.

Thanks

Arefi Clayton
  • 841
  • 2
  • 10
  • 19
  • theres probably an extra byte in there that cant be converted. In which case utf-8 just ignores it. Check the docs on that function, you should be able to customize what gets ignored or forced when something cant be converted to utf8 – Rooster Nov 01 '15 at 23:39
  • What's the encoding of the PHP file? – Karoly Horvath Nov 01 '15 at 23:39
  • @KarolyHorvath its also UTF-8. Rooster, i'll try that thank you. – Arefi Clayton Nov 01 '15 at 23:43
  • 1
    If it is utf8 why encode it to utf8? I can reproduce this behavior and taking out the utf8_encode resolves it. That function is for... `Encodes an ISO-8859-1 string to UTF-8`. – chris85 Nov 01 '15 at 23:47
  • Interesting ... you should send `header("Content-Type: text/html; charset=utf-8");` before any content though. – jpaljasma Nov 02 '15 at 00:03

1 Answers1

4

If the encoding of the string is already UTF8 (as opposed to ISO-8859-1(5)), you need do nothing:

utf8_encode — Encodes an ISO-8859-1 string to UTF-8

Actually, running utf8_encode on a string which is already UTF8 is bound to wreak some kind of havoc.

You say that the file encoding is UTF8, but what you get looks like ISO-8859. So I suspect you have something that's messing up with the encoding chain.

Verify the Content-Type header (i.e. verify that the one you set is, indeed, the one that gets sent), double check the file encoding, and the browser setting as well (it should be either UTF8 or autodetect).

Also, it is quite strange that you should get "SsIg" -- that is definitely not the expected behaviour of UTF8 encoding. It almost seems that something is trying to map your characters back into the ASCII set by mapping them to the most similar ASCII character. I'd therefore also check any proxies or caches or anything in the middle which is in position to manipulate the data sent by your script.

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • I checked and everything seems to be right. Weird part is, when i use `mb_detect_encoding($str, "auto")` on variables that i've created, it says they are already UTF-8. However, i see some weird stuff. Still couldn't understand why. – Arefi Clayton Nov 02 '15 at 00:05
  • Okay i've fixed it. I was using EasyPhp for windows and i went for a re-install. Double checked everything you said and its working now. Besides, fixinig my problem, you gave some pretty good information here. Thank you so much. – Arefi Clayton Nov 02 '15 at 00:11