1

I have a Pandas Dataframe with the Nordic letters æ, ø and å. I want to format this dataframe to JSON. Everything is working fine except that in the JSON file å is written "\\u00e5" for instance. I have tried the following:

import json

df_qnapairs.questions = df_qnapairs.questions.str.encode('utf-8')

json_dump = json.dumps(df_qnapairs.to_json(orient = 'records'), ensure_ascii = False)

json_dump

However, the output looks just the same and it does not handle æ, ø, å.

Any suggestions? This is in Databricks with Python

Andreas
  • 91
  • 8
  • Does this answer your question? [How the keep the original value of unicode characters even after writing it to a json file?](https://stackoverflow.com/questions/66132291/how-the-keep-the-original-value-of-unicode-characters-even-after-writing-it-to-a) – JosefZ Feb 25 '21 at 16:05
  • @JosefZ this does not answer my question, I have a dataframe, the link you are referring to is using JSON from the get go. Also, as I have stated, I have tried the ensure_ascii = False parameter without success. – Andreas Feb 26 '21 at 07:39

1 Answers1

3

After a while, I found a solution and it was the to_json function that was the problem. I needed to add the force_ascii = False parameter

df_qnapairs.to_json(orient = 'records', force_ascii = False)
Andreas
  • 91
  • 8