1

My website content is based on an XML which I am getting from an external partner.

The XML code contains Unicode characters (sample: Δ) and my problem is, that my (UTF-8 encoded) website is currently not displaying it correctly. Instead of Δ I am only getting a ?

Is there a chance to resolve this on my end (how?) or do I have to ask the external partner to send the XML again with Entities?

<table>
<tbody>
    <tr>
        <td>Δ Test</td>
    </tr>
</tbody>
</table>
JonSnow
  • 573
  • 14
  • 48
  • `'Δ Test'.encode('ascii', 'xmlcharrefreplace').decode()` returns `'Δ Test'` in Python. – JosefZ May 12 '22 at 18:27
  • Still not solved. If I have a Unicode Symbol in my source code (and the web browser is not recognizing it, instead just displaying a questionmark): how can I display it? Do I have to automatically endocde the whole body text?? (and Python is not an option here) – JonSnow May 18 '22 at 13:30

1 Answers1

0

You can use & an then the unicode number, should be '&#916';

Poder Psittacus
  • 94
  • 1
  • 10
  • Yes I know. But I only have the Unicode character available in the given code. How can I display the character without having to (manually) change the charactor to the unicode number? – JonSnow May 12 '22 at 09:22
  • You could just right a short Program in Python like string = 'disajfodj Δ adsod ' string.replace('Δ','Δ'), if that is an option – Poder Psittacus May 12 '22 at 09:26
  • or use php funktion utf8_encode() – Poder Psittacus May 12 '22 at 09:28
  • Or use Javaskript function encode_utf8( s ){ return unescape( encodeURIComponent( s ) ); }( '\u4e0a\u6d77' ) – Poder Psittacus May 12 '22 at 09:30
  • https://stackoverflow.com/questions/10576905/how-to-convert-javascript-unicode-notation-code-to-utf-8 – Poder Psittacus May 12 '22 at 09:30
  • But in every case I would have to identify the Symbol as an Unicode Character first. Don't see how this could happen "on the fly" ? – JonSnow May 12 '22 at 11:42
  • I mean you could just convert the whole text, the text should stay the same because UTF-8 is a smaller part of Unicode, if I recall that one correctly. Or at least you could try that, maybe it works XD – Poder Psittacus May 12 '22 at 12:25
  • sry I wanted to save ASCII is a part of Unicode and UTF-8 is only used to convert non ASCII characters, so your text would not be touched only the special Unicode symbols – Poder Psittacus May 12 '22 at 12:35