After reading more on http://php.net/manual/en/function.htmlentities.php I noticed it doesn't encode all unicode. Someone wrote superentities
in the comments but that function seem to not work for me. The UTF8entities
function did.
Here are two functions I modified from the comment section and the code, while not exactly what I wanted it does give me html encoded values.
$html = "<span>🆃🅴🆂🆃</span>";
$doc = new DOMDocument;
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadhtml($html);
foreach ($doc->getElementsByTagName('span') as $node)
{
var_dump(UTF8entities($node->nodeValue));
}
function UTF8entities($content="") {
$characterArray = preg_split('/(?<!^)(?!$)/u', $content ); // return array of every multi-byte character
foreach ($characterArray as $character) {
$rv .= unicode_entity_replace($character);
}
return $rv;
}
function unicode_entity_replace($c) { //m. perez
$h = ord($c{0});
if ($h <= 0x7F) {
return $c;
} else if ($h < 0xC2) {
return $c;
}
if ($h <= 0xDF) {
$h = ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
} else if ($h <= 0xEF) {
$h = ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6 | (ord($c{2}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
} else if ($h <= 0xF4) {
$h = ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12 | (ord($c{2}) & 0x3F) << 6 | (ord($c{3}) & 0x3F);
$h = "&#" . $h . ";";
return $h;
}
}
Returns this:
string(36) "🆃🅴🆂🆃"