1

I am working with a file that contains unicode emoji's. It seem fine while keeping it as original. I can see the emoji. But when I read it using json module and write again, It transforms the emoji to something like this: "\ud83d\ude00". So my emoji "" becomes "\ud83d\ude00" after writing. I am using the below code:

import json

with open("emoji-by-category.json", encoding='utf-8', errors='ignore') as json_data:
    data = json.load(json_data, strict=False)

with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
    json.dump(data, json_file, indent=4)

Here is example json file:

[
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "1",
        "code": "U+1F600",
        "text": "\ud83d\ude00",
        "recentlyAdded": false,
        "name": "grinning face",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": false,
            "DCM": false,
            "KDDI": false,
            "Tlgr": true
        },
        "tags": [
            "face",
            "grin",
            "grinning face"
        ],
        "keywords": [
            "face",
            "grin",
            "grinning",
            "subdivision",
            "flag",
            ":D",
            "grinning face"
        ]
    },
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "2",
        "code": "U+1F603",
        "text": "\ud83d\ude03",
        "recentlyAdded": false,
        "name": "grinning face with big eyes",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": true,
            "DCM": true,
            "KDDI": true,
            "Tlgr": true
        },
        "tags": [
            "face",
            "grinning face with big eyes",
            "mouth",
            "open",
            "smile"
        ],
        "keywords": [
            "face",
            "grinning",
            "big",
            "eyes",
            "mouth",
            "open",
            "smile",
            "subdivision",
            "flag",
            "grin",
            "eye",
            ":D",
            ":)",
            "grinning face with big eyes"
        ]
    },
    {
        "category": "Smileys & Emotion",
        "section": "face-smiling",
        "n": "3",
        "code": "U+1F604",
        "text": "\ud83d\ude04",
        "recentlyAdded": false,
        "name": "grinning face with smiling eyes",
        "vendors": {
            "Appl": true,
            "Goog": true,
            "FB": true,
            "Wind": true,
            "Twtr": true,
            "Joy": true,
            "Sams": true,
            "GMail": true,
            "SB": true,
            "DCM": false,
            "KDDI": false,
            "Tlgr": true
        },
        "tags": [
            "eye",
            "face",
            "grinning face with smiling eyes",
            "mouth",
            "open",
            "smile"
        ],
        "keywords": [
            "eye",
            "face",
            "grinning",
            "smiling",
            "eyes",
            "mouth",
            "open",
            "smile",
            "subdivision",
            "flag",
            "grin",
            "joy",
            "funny",
            "haha",
            "laugh",
            ":D",
            ":)",
            "grinning face with smiling eyes"
        ]
    }
]
Zubayer
  • 571
  • 1
  • 8
  • 16

1 Answers1

2

Use

with open("emoji-by-category2.json", 'w', encoding='utf-8', errors='ignore') as json_file:
    json.dump(data, json_file, indent=4, ensure_ascii=False)

Read docs for the JSON encoder and decoder:

Basic Usage:

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

… If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. …

JosefZ
  • 28,460
  • 5
  • 44
  • 83