5

There is a string I'm trying to output in an htmlencoded way, and the htmlentities() function always returns an empty string.

I know exactly why it does so. Well, I am not running PHP 5.4 I got the latest PHP 5.3 flavor installed.

The question is how I am gonna be able to htmlencode a string which has invalid code unit sequences.

According to the manual, ENT_SUBSTITUTE is the way to go. But this constant is not defined in PHP 5.3.X.

I did this:

if (!defined('ENT_SUBSTITUTE')) {
    define('ENT_SUBSTITUTE', 8);
}

still no luck. htmlentities is still returning empty string.

I wanted to try ENT_DISALLOWED instead, but I cannot find its corresponding long value for it.

So my question is two folded

  1. What's the constant value of PHP 5.4's ENT_DISALLOWED?

  2. How do I make sure that a string containing non UTF-8 characters (such as the smart quotes), can be cleared out of them? - Not just the smart quotes but anything that causes htmlentities() to return blank string.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Average Joe
  • 4,521
  • 9
  • 53
  • 81

2 Answers2

3

It is true that htmlentities() in PHP 5.3 does not have the ENT_SUBSTITUTE flag, however it has the (not really suggested) ENT_IGNORE flag. Be ware of the note and try to understand it before use:

Using this flag is discouraged as it » may have security implications.

It is far better that you understand why there is a problem with the input string in the first place. Most often users are only missing to specify the correct encoding.

E.g. first re-encode the string into UTF-8, then pass it to htmlspecialchars() or htmlentities(). Speaking of smart-quotes you are probably using a Windows-1252 encoded string. You won't even need to convert that one before use, you can just specify the charset properly (PHP 5.3):

htmlentities($string, ENT_QUOTES, $encoding = 'Windows-1252');

Naturally this only works if the input $string is encoded in Windows-1252 (CP1252). Find out the correct encoding first, then it's normally no problem. For non-supported encodings re-encode into a supported one first, for example with iconv or mb_string.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • If anyone happens to be looking for this because they are following Lynda.com MYSQL Essential Training and ran into "Use of undefined constant ENT_SUBSTITUTE" error on line 600 in Sid.php, I will tell you based on what I read here and that I could not find a clear equivalent flag for PHP 5.3, I went ahead and deleted that flag and the app works fine now. There are security implications as @hakre noted, but I'm working on a local development server with no access to the web. – Eric Hepperle - CodeSlayer2010 Jul 02 '15 at 01:51
2

As you say, these constants were added in 5.4.0. The thing is, the support is new to 5.4.0 as well. Meaning you can pass whatever values you want, older htmlentities will not understand it.

As it is most probably the case, php changelog is quite misleading.

Mikulas Dite
  • 7,790
  • 9
  • 59
  • 99