2

I have string that looks like this "v\u00e4lkommen till mig" that I get after doing utf8_encode() on the string.

I would like that string to become

 välkommen till mig

where the character

  \u00e4 = ä = ä

How can I achive this in PHP?

AlexanderNajafi
  • 1,631
  • 2
  • 25
  • 39

3 Answers3

3
  • Do not use utf8_(de|en)code. It just converts from UTF8 to ISO-8859-1 and back. ISO 8859-1 does not provide the same characters as ISO-8859-15 or Windows1252, which are the most used encodings (besides UTF-8). Better use mb_convert_encoding.

  • "v\u00e4lkommen till mig" > This string looks like a JSON encoded string which IS already utf8 encoded. The unicode code positiotion of "ä" is U+00E4 >> \u00e4.

Example

<?php
header('Content-Type: text/html; charset=utf-8');
$json = '"v\u00e4lkommen till mig"';
var_dump(json_decode($json)); //It will return a utf8 encoded string "välkommen till mig"

What is the source of this string?

There is no need to replace the ä with its HTML representation &auml;, if you print it in a utf8 encoded document and tell the browser the used encoding. If it is necessary, use htmlentities:

<?php
$json = '"v\u00e4lkommen till mig"';
$string = json_decode($json);
echo htmlentities($string, ENT_COMPAT, 'UTF-8');
BreyndotEchse
  • 2,192
  • 14
  • 20
  • as he pointed out in a comment on my answer, he wants to keep `<>` untouched, so your `htmlentities` option will need adjusting for that. Serving the document as UTF-8 should work fine. – Dave Aug 10 '13 at 14:54
  • I am not outputing text/html. The content-type is text/plain, the output is used in a Android app. And yes, i am using json_decode. – AlexanderNajafi Aug 10 '13 at 15:28
  • Hm.. When you are using json_decode, *\u00e4* should not appear in this string anymore. You can serve text/plain in utf8 equally: `Content-Type: text/plain; charset=utf-8` (But I do not get this, as your output is HTML obviously). Please give us some more code to make it easier for us to help you ;) – BreyndotEchse Aug 10 '13 at 15:50
0

Edit: Since you want to keep HTML characters, and I now think your source string isn't quite what you posted (I think it is actual unicode, rather than containing \unnnn as a string), I think your best option is this:

$html = str_replace( str_replace( str_replace( htmlentities( $whatever ), '&lt;', '<' ), '&gt;', '>' ), '&amp;', '&' );

(note: no call to utf8-decode)

Original answer:

There is no direct conversion. First, decode it again:

$decoded = utf8_decode( $whatever );

then encode as HTML:

$html = htmlentities( $decoded );

and of course you can do it without a variable:

$html = htmlentities( utf8_decode( $whatever ) );

http://php.net/manual/en/function.utf8-decode.php

http://php.net/manual/en/function.htmlentities.php

To do this by regular expression (not recommended, likely slower, less reliable), you can use the fact that HTML supports &#xnnnn; constructs, where the nnnn is the same as your existing \unnnn values. So you can say:

$html = preg_replace( '/\\\\u([0-9a-f]{4})/i', '&#x$1;', $whatever )
Dave
  • 44,275
  • 12
  • 65
  • 105
  • Hi, your first solution works. But the thing is that (sorry for not saying earlier) I have html tags in my string as well so a string can look like this "

    välkommen

    ", the solution makes the "<" into <
    – AlexanderNajafi Aug 10 '13 at 14:29
  • In that case you either need to add special cases for `<>&` (do a final replace to convert them back) or use the regular expression. If this is user-submitted input, a more common option is to use a restricted markup and only convert it to HTML as a last step. – Dave Aug 10 '13 at 14:32
  • Thanks, tried your regex, but it does not seem to work. Have you tested it? – AlexanderNajafi Aug 10 '13 at 14:34
  • @mr.axelander I have, but I'm thinking there might be some confusion about your original format. You may have better luck with this one: http://www.php.net/manual/en/function.utf8-decode.php#92777 – Dave Aug 10 '13 at 14:37
  • or see my edit. I'm still not quite sure what your source string is, but I believe that line will work. – Dave Aug 10 '13 at 14:42
0

The html_entity_decode worked for me.

$json = '"v\u00e4lkommen till mig"';
echo $decoded = html_entity_decode( json_decode($json) );
Steffo Dimfelt
  • 870
  • 12
  • 11