I have received some XML from an upstream data source.
I'm not sure if these weird characters are valid UTF8 -or- the upstream source has messed things up. i.e. Bad data in => bad data out.
I'm guessing the following is what was passed down:
Value in XML file | Unicode Value | UTF-8 Value | English Description
-------------------------------------------------------------------------------------------
’ | U+2019 | \xe2\x80\x99 | RIGHT SINGLE QUOTATION MARK
• | U+2022 | \xe2\x80\xa3 | BULLET
& | -not unicode- | -- | Ampsersand, HTML Encoded.
i feel like the \
at the start of the UFT-8 value is sorta... encoded but .. done wrong?
Can someone please explain what I'm looking at, so I know how to correctly decode it. What's also frustrating is that i feel like this could be a mix of encodings which will make things awful :(
Reference: http://utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=string-literal