Jupyter notebooks - how to move utf-8 characters between scripts?

Question

I'm working with scripts in jupyter notebooks. For code clarity, I would like to 'outsource' some definitions into a second script. But there are umlaute, 'ü's, 'ä's, 'ö's and others, in the definitions. Definitions look like this:

an_outsourced_dict = {'Hello': 'Hallo', 'Door': 'Tür'}

After running into problems with 'importing' the second script with importlib, I'm now just running

%run myotherscript.ipynb

in the first script and have access to whatever is defined in myotherscript.ipynb afterwards.

The problem: Content of an_outsourced_dict in the calling script is {'Door': 'TÃ¼r', 'Hello': 'Hallo'}, and an_outsourced_dict['Door'] == 'Tür' returns False.

Also, when I add a print(an_outsourced_dict) in mysecondscript.ipynb, it will print 'TÃ¼r' as well, when called via %run. But defining and printing it from the main script gives a 'Tür'.

(How) Can this be solved? For now I'll just put everything in one script.

This seems related, but my problem is not about writing anything to a file.

Additional information:

Python version 3.5, trying to get the ipython version as described here also gives an encoding error :) and I'm on Windows (8.1)

I cannot reproduce your problem (Xubuntu 18.10, Python 3.6.7rc1, Jupyter 4.4.0). I created a Python 3 notebook "parent.ipynb" which contains `print('Hello Tür')` and then did `%run parent.ipynb` in a second notebook which prints "Hello Tür". Encoding seems to be utf-8, I don't know why. Check with `import sys; sys.getdefaultencoding()`. — lumbric, Oct 27 '18 at 17:05
@lumbric , thanks for trying to reproduce. My setting was a tiny bit different: I define the dict and then print(thedict), not just print("string with lots of ü's"). Yes, default encoding with python 3 seems to be utf-8 (from the other questions on this I found when searching); that's also the output when I run sys.getdefaultcoding() - 'utf-8'. — dasWesen, Oct 27 '18 at 17:21
Encoding seems to get messed up in between the scripts somehow — dasWesen, Oct 27 '18 at 17:30
I could reproduce with Python 3.3 (notebook 4.2.3), but running with Python 3.7 (notebook 5.7.0) and refreshing the cells printed correctly. It looks like the older Jupyter notebook decoded the imported script with the Windows default ANSI encoding (probably Windows-1252) and the newer defaults to UTF-8. The .ipynb content was encoded as UTF-8. So it looks like it was just a bug in notebook or its dependencies that's been fixed in newer versions. — Mark Tolonen, Oct 29 '18 at 07:07

score 1 · Accepted Answer · answered Oct 27 '18 at 16:32

1

I faced this problem before. You can try dumping out your dictionary to a .json file like this:

with open('output.json', 'w', encoding='utf-8') as output_file:
    json.dump(your_dictionary, output_file, ensure_ascii=False, indent=4)
    lables_file.write("\n")

ensure_ascii=False will make sure the output character to the .json file are what you will expected

answered Oct 27 '18 at 16:32

lamhoangtung

834
2
10
22

1

Thanks, that would probably work as a workaround, yes... Maybe it's even cleaner in this situation to load from a .json than to %run another script, even though you have to go via another format. – dasWesen Oct 27 '18 at 17:27
@dasWesen If you have smaller than 10 million key in your dictionary it won't affect the performance of your program at all. Maybe you can try something similar to `ensure_ascii=False` to solve the problem – lamhoangtung Oct 28 '18 at 00:53
Changing python versions seems a bit too much, so I'm using your workaround now. Thanks for the parameter settings, works great so far. – dasWesen Oct 29 '18 at 11:13

Jupyter notebooks - how to move utf-8 characters between scripts?

1 Answers1