10

I recently installed PHP 5.4 on my Ubuntu 12.10 from apt-get.

PHP Info shows: PHP Version 5.4.6-1ubuntu1

I just installed all common packages, like mysql, pgsql, curl, etc, didn't make any other changes but I have a problem.

I like using the ISO-8859-1/latin1 encoding in my files and databases, because it was where I got the best workflow. Now I have a problem with this because PHP does not seem to get along with exceptions whose messages encoded that way.

Well, just for clarify it better, I created a test file like this:

ini_set('display_errors', 1);
error_reporting(E_ALL);

throw new Exception('é');

If the code above is in a utf-8 file, it's all ok, with Xdegub enabled I get:

( ! ) Fatal error: Uncaught exception 'Exception' with message 'é' in /home/henrique/public/teste.php on line 5
( ! ) Exception: é in /home/henrique/public/teste.php on line 5
Call Stack
#   Time    Memory  Function    Location
1   0.0002  124212  {main}( )   ../teste.php:0

If the file is in ISO-8859-1, if Xdebug is enabled, the problem is just the message not being displayed:

( ! ) Fatal error: in /home/henrique/public/teste.php on line 5
( ! ) Exception: in /home/henrique/public/teste.php on line 5
Call Stack
#   Time    Memory  Function    Location
1   0.0002  124436  {main}( )   ../teste.php:0

However, without Xdebug, all I get is this "very clarifying" message:

Fatal error: in /home/henrique/public/teste.php on line 5

Maybe it's a problem within Apache, because when I try the same using the command line, I get:

Stack trace:
#0 {main}
  thrown in /home/henrique/public/teste.php on line 5

Fatal error: Uncaught exception 'Exception' with message '�' in /home/henrique/public/teste.php on line 5

Exception: � in /home/henrique/public/teste.php on line 5

Call Stack:
    0.0002     121256   1. {main}() /home/henrique/public/teste.php:0

The message is still there, however, it's illegible, but is there...

Edit

I also tried with Lighttpd 1.4.28 and the results were the same.

Edit 2:

Tried with PHP 5.4 built-in server and got this on my terminal:

[Wed Jun  5 21:32:08 2013] PHP Fatal error:  Uncaught exception 'Exception' with message '�' in /var/www/test2.php:9
Stack trace:
#0 {main}
  thrown in /var/www/test2.php on line 9
[Wed Jun  5 21:32:08 2013] 127.0.0.1:55116 [200]: /test2.php - Uncaught exception 'Exception' with message '�' in /var/www/test2.php:9
Stack trace:
#0 {main}
  thrown in /var/www/test2.php on line 9

But in the browser, still the same problem.

Community
  • 1
  • 1
Henrique Barcelos
  • 7,670
  • 1
  • 41
  • 66
  • 1
    Try changing default charset for headers http://php.net/manual/en/ini.core.php#ini.default-charset, since php5.4 it is utf-8 by default. – dev-null-dweller Nov 04 '12 at 18:04
  • I changed, but didn't work... same problem yet... (Yes, I restarted Apache) – Henrique Barcelos Nov 04 '12 at 18:11
  • Try to change character encoding in your terminal. By default ubuntu uses UTF-8. – claustrofob May 24 '13 at 09:31
  • 2
    Not what you want to hear, but you should consider setting everything to utf-8, allways, even in db (dont forget to set connection encoding in your mysql connection) or other external parts. You will have loads of problems with transcoding back and forth many times if you dont. utf-8 supports all characters, there is no need for anything else ;) – ToBe May 24 '13 at 10:26
  • @ToBe the problem was that I already had a huge system written in latin1/iso-8859-1, but there was no other way, I had to convert all files, sadly. The database was the minor problem, since it supports multiple colations once you are connected (at least MySQL does). But I still don't get why this happenS. – Henrique Barcelos May 24 '13 at 19:22
  • @claustrofob There is no problem with the terminal, since the exception is shown, just with the characters messed up. – Henrique Barcelos May 27 '13 at 18:45
  • @Henrique Barcelos as i said ubuntu terminal uses UTF8 as default encoding. When you run your script in terminal you get an exception text in ISO-8859-1 encoding shown in UTF8. So if you change your terminal window encoding to ISO-8859-1 you will see your characters. – claustrofob May 27 '13 at 18:58
  • @claustrofob Yes, but again: this is not the problem. Read the question again, please. The problem is that no exception is show when I access the script FROM THE BROWSER. – Henrique Barcelos May 27 '13 at 22:55
  • throw new Exception(utf8_encode('é')); – Alejandro Iván May 28 '13 at 23:07
  • This might imply that printing the é is causing an exception which is not handled. Maybe can you see the exception when you or handle the exception or make a print statement with this character before the throw statement. Another 2 cents: might the empty page being displayed come from the cache? – Loek Bergman May 29 '13 at 11:15
  • @AlejandroIván it is a bit overhead utf8-encode all my exceptions, and yes, this will work. – Henrique Barcelos May 29 '13 at 11:58
  • @LoekBergman, I tried this in 2 different PCs, so, it is not a cache problem. When I try to do what you said, `echo 'é'` before the exception, works just as expected (the 'é' is shown on screen), but, in the very next line, I throw the exception and the scenario remains the same, no message shown. – Henrique Barcelos May 29 '13 at 12:00
  • Messages shown depend on codification of your terminal. You are using an UTF-8 encoded one, so if your files were written in ISO-8859 or Latin, then this will obviously happen. You can: 1) Use utf8_encode() for every message (annoying). 2) Rewrite your script messages using the correct encoding (UTF-8). 3) Change the encoding of the termininal you are using (more annoying and problematic than the first one if you will use this on different computers). I don't know what other options you have. The better one would be 2). – Alejandro Iván May 31 '13 at 00:56
  • 1
    @AlejandroIván I'll say this one last time: READ THE QUESTION. There's absolutely NOTHING TO DO WITH THE TERMINAL. I am not complaining about the character encoding, I know how they work, the problem is that when I throw the exception with non-utf8 characters, it is not shown, as I show in the 3rd block of code on the question. – Henrique Barcelos May 31 '13 at 12:09

4 Answers4

1

Have you tried this in a different server?

I think is your configuration, I created a test file on my server, you can view it here http://cai.tlacaelelrl.com/tests/test.php

the contents are

    ini_set('display_errors', 1);
    error_reporting(E_ALL);
    print 'Character encoding is: '.mb_internal_encoding();
    throw new Exception('é');

The character set is applied to the file, I also added the character set to the htaccess file.

I am not sure if it is because of xdebug but I could not do a test with it enabled.

Can you try adding this

   AddCharset ISO-8859-1 .php

To your .htaccess file

1

The exception message in PHP is a string, like no news to you.

Strings in PHP are binary. This effectively means that PHP does not care at all about the encoding therein, strings in PHP just preserve any encoding that can be expressed with binary data in octets (that is that 8 bits form a single byte which then is one character in a PHP string if you use substring access like $string[10] to access the 11th character).

As all those things ensure that however you write the message, however it will be passed into the output.

So the only difference is how you display the output. Let's say you've got the Latin-1 encoding in that exception message string and you output it via your apache server and then you view it in your browser and your browser (we don't care about the reason so far) displays it as UTF-8 you will see that question-mark-diagmond/crystal: �.

Same applies to the terminal if the terminal displays it as UTF-8.

Or if you save the output into a file and then you open that file in your editor as being UTF-8 encoded.

So how to fix that? For your browser, please look into the documentation of your browser how you can tell your browser in which encoding the website you're currently looking at should be displayed. Every browser I know of has some kind of menu where you can specify it. The charset you use is commmon, so even older browsers have that.

Same applies to the terminal. You can set the locale of the shell as well as the encoding for the terminal. Consult the documentation of the shell you're using.

For the textfile, I bet you now already know how to deal with it: Checkout which options your editor provide.


A final note of caution: If you want to properly analyze what your server returns to a request containing the exception message output, you need to use the developer tools of your browser to make the server's response headers visible. You will likely see a change to your previous configuration that is (in error) saying that the content is UTF-8 encoded while the encoding is latin-1. Fix that error if you don't want to change the encoding in the browser manually. To do that, consult the PHP documentation and the documentation of your webserver.

hakre
  • 193,403
  • 52
  • 435
  • 836
1

ab@php.net came up with an explanation:

https://bugs.php.net/bug.php?id=63426&edit=2

The reason it's cannot be fixed is complex is simple. Since 5.4 the PHP's internal encoding is UTF-8, where it was latin1 before. Everything else has almost no change.

Every error message to show in HTML context needs to have the entities converted. For that the same functionality as in htmlspecialchars() is used. Where before PHP 5.4 it was forced to use latin1, now it's forced to use UTF8. There is per design. Using header() with content-type or default_charset affects merely only the senging of the content-type header.

Thus, you use error text in latin1, but UTF-8 will be used to convert entities, and that will die at the first invalid char. The relevant place in the code: http://lxr.php.net/xref/PHP_5_4/main/main.c#1083 , subsequently determine_charset() will deliver UTF8 for the conversion charset. That's the reason why your accent char is swallowed. And that's the reason why Hui couldn't reproduce this - if you look at his post earlier, indeed latin1 is sent in content-type, but obviously an UTF-8 encoded PHP script used, so the error message is "Fatal error: Uncaught exception 'Exception' with message 'é' in ...". The current condition however doesn't enforce you to have scripts in UTF-8, in your script encoded in latin you still could throw the exception using utf8_encode('é'). The reason it works with CLI is because no HTML entities have to be encoded, so the chars are passed as is to the output.

This all actually means this issue was always there, but it was in favour of users with default iso-8859-1. Now users with default UTF-8 do profit. Looking through the codes to solving this might require more global intrusion than required just by this ticket.

For htmlspecialchars() behaviour change see also bug #61354

Henrique Barcelos
  • 7,670
  • 1
  • 41
  • 66
0

I have the same problem and didn't find a good solution ("AddCharset ISO-8859-1 .php" in .htaccess don't work). You can use this this:

throw new Exception(htmlentities('é', ENT_COMPAT, 'ISO-8859-1'));

But Xdebug will show:

&agrave ;

It's better than nothing