I have a few large JSON files from a source I don't control that I'm trying to clean up in Notepad++ before using them as program input.
The file contains a lot of unicode sequences, which I unfortunately know very little about. It's the type using two or three sequences to represent one character, such as \u00c3\u00a9 for é, and \u00e2\u0080\u0094 for an em dash (—).
I've spent all night Googling how to convert these back into normal characters, but unfortunately I don't understand much of what I came across.
I did eventually figure out that by installing the HTML Tag plugin, I can use "Decode JS" on them, then convert the whole file to ANSI and then represent it as UTF-8, which fixes the issue with most of the characters.
But some, such as the em dash or Ç (\u00c3\u0087), still refuse to be converted.
Can someone please point me toward why these particular characters still display incorrectly, and how I can fix it? Thanks.