-1

I want to convert a string from utf-8 to iso-8859-1 in php. (actually I want to remove all characters that are not in the ISO-8859-1 character set).

$text = "test ‍♂️ test xäöüx x@x x€x";

$text = iconv('UTF-8', 'ISO-8859-1//IGNORE', $text);

the expected output would be: test test xäöüx x@x xx

but I get: test test x���x x@x xx

why does iconv have problems with german umlauts? and why are they not removed when in doubt but turned into question marks?

fx123
  • 37
  • 1
  • 5

1 Answers1

2

Characters äöü (U+00E4, U+00F6 and U+00FC for what it's worth) have this encoding in ISO-8859-1:

  • ä: E4
  • ö: F6
  • ü: FC

If we run a variation of your code with some additional debugging information:

$text = 'äöü';
$text = iconv('UTF-8', 'ISO-8859-1//IGNORE', $text);
echo bin2hex($text);

... we get the expected output:

e4f6fc

You can see � for a few reasons, all of them related to whatever rendering tool you are using (a web browser, I presume):

  • ISO-8859-1 not expected or supported.
  • Missing or incorrect encoding information.
  • Missing glyph in selected font (this is rare in browsers, since they use fallback fonts).
Álvaro González
  • 142,137
  • 41
  • 261
  • 360