How to properly encode data in JSON from Dataframe in Pandas

Question

I have a Pandas Dataframe with the Nordic letters æ, ø and å. I want to format this dataframe to JSON. Everything is working fine except that in the JSON file å is written "\\u00e5" for instance. I have tried the following:

import json

df_qnapairs.questions = df_qnapairs.questions.str.encode('utf-8')

json_dump = json.dumps(df_qnapairs.to_json(orient = 'records'), ensure_ascii = False)

json_dump

However, the output looks just the same and it does not handle æ, ø, å.

Any suggestions? This is in Databricks with Python

Does this answer your question? [How the keep the original value of unicode characters even after writing it to a json file?](https://stackoverflow.com/questions/66132291/how-the-keep-the-original-value-of-unicode-characters-even-after-writing-it-to-a) — JosefZ, Feb 25 '21 at 16:05
@JosefZ this does not answer my question, I have a dataframe, the link you are referring to is using JSON from the get go. Also, as I have stated, I have tried the ensure_ascii = False parameter without success. — Andreas, Feb 26 '21 at 07:39

score 3 · Answer 1 · answered Feb 26 '21 at 08:17

3

After a while, I found a solution and it was the to_json function that was the problem. I needed to add the force_ascii = False parameter

df_qnapairs.to_json(orient = 'records', force_ascii = False)

answered Feb 26 '21 at 08:17

Andreas

91
8

How to properly encode data in JSON from Dataframe in Pandas

1 Answers1