0

I have problem with converting unicode characters to utf-8. Here is my code:

<?php 
    $unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';

    $utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $unicode), ENT_NOQUOTES, 'UTF-8');

    echo $utf8string;
?>

And it gives me below:

\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d

What did i do wrong ? any advice ?

batgerel.e
  • 837
  • 1
  • 10
  • 31
  • see here: http://php.net/manual/en/migration70.new-features.php#migration70.new-features.unicode-codepoint-escape-syntax and here: https://stackoverflow.com/questions/1805802/php-convert-unicode-codepoint-to-utf-8 – Jeff Nov 22 '18 at 02:24
  • Possible duplicate of [PHP: Convert unicode codepoint to UTF-8](https://stackoverflow.com/questions/1805802/php-convert-unicode-codepoint-to-utf-8) – Jeff Nov 22 '18 at 02:26

1 Answers1

1

At the very least your regular expression is looking for an uppercase U, while all your escape sequences use lower-case.

But your conversion script goes from javascript-escaped unicode characters, to HTML entities, back to a PHP string. This might be a saner solution (for this string):

$unicode = '\u0411. \u0426\u044d\u0446\u044d\u0433\u0441\u04af\u0440\u044d\u043d';
echo json_decode('"' . $unicode . '"');

Be careful though, as this might break if the input string contains newlines or quotes.

Evert
  • 93,428
  • 18
  • 118
  • 189