I have downloaded JSON data from Instagram that I'm parsing in NodeJS and storing in MongoDB. I'm having an issue where escaped unicode characters are not displaying the correct emoji symbols when displayed on the client side.
For instance, here's a property from one of the JSON files I'm parsing and storing:
"title": "@mujenspirits is in the house!NEW York City \u00f0\u009f\u0097\u00bd\u00f0\u009f\u008d\u008e \nImperial Vintner Liquor Store"
The above example should display like this:
@mujenspirits is in the house!NEW York City Imperial Vintner Liquor Store
But instead looks like this:
@mujenspirits is in the house!NEW York City 🗽🎠Imperial Vintner Liquor Store
I found another SO question where someone had a similar problem and their solution works for me in the console using a simple string, but when used with JSON.parse
still gives the same incorrect display. This is what I'm using now to parse the JSON files.
export default function parseJsonFile(filepath: string) {
const value = fs.readFileSync(filepath)
const converted = new Uint8Array(
new Uint8Array(Array.prototype.map.call(value, (c) => c.charCodeAt(0)))
)
return JSON.parse(new TextDecoder().decode(converted))
}
For posterity, I found an additional SO question similar to mine. There wasn't a solution, however, one of the comments said:
The JSON files were generated incorrectly. The strings represent Unicode code points as escape codes, but are UTF-8 data decoded as Latin1
The commenter suggested encoding the loaded JSON to latin1
then decoding to utf8
, but this didn't work for me either.
import buffer from 'buffer'
const value = fs.readFileSync(filepath)
const buffered = buffer.transcode(value, 'latin1', 'utf8')
return JSON.parse(buffered.toString())
I know pretty much nothing about character encoding, so at this point I'm shooting in the dark searching for a solution.