14

I'm trying to load /usr/share/matplotlib/sample_data/goog.npy:

datafile = matplotlib.cbook.get_sample_data('goog.npy', asfileobj=False)
np.load(datafile)

It's fine in Python 2.7, but raises an exception in Python 3.4:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd4 in position 1: ordinal not in range(128)

I assume it has something to do with bytes/str/unicode inconsistency between Python 2 and 3, but have no idea how to get through.

Question:

  • How to load a .npy file (NumPy data) from Python 2 in Python 3?
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Frozen Flame
  • 3,135
  • 2
  • 23
  • 35

3 Answers3

7

The problem is that the file contains serialized (pickled) Python datetime objects, and not just numerical data. The Python serialization format for these objects is not compatible across Py2 to Py3:

python2
>>> import pickle
>>> pickle.dumps(datetime.datetime.now())
"cdatetime\ndatetime\np0\n(S'\\x07\\xde\\x06\\t\\x0c\\r\\x19\\x0f\\x1fP'\np1\ntp2\nRp3\n."

and

python3
>>> import pickle
>>> pickle.loads(b"cdatetime\ndatetime\np0\n(S'\\x07\\xde\\x06\\t\\x0c\\r\\x19\\x0f\x1fP'\np1\ntp2\nRp3\n.")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xde in position 1: ordinal not in range(128)

A workaround is to change inside Numpy code

numpy/lib/format.py:
...
446         array = pickle.load(fp)

to array = pickle.load(fp, encoding="bytes"). A better solution would be to allow numpy.load pass on the encoding parameter.

pv.
  • 33,875
  • 8
  • 55
  • 49
  • I changed according to you but error raised: `TypeError: an integer is required (got type str)`, pointing to line 446 of `numpy/lib/format.py`. My numpy version is `1.8.1` and python version is 3.4.0. – Frozen Flame Jun 09 '14 at 09:37
  • The correct choice seems to be `encoding="bytes"` rather than `encoding="latin1"`. – pv. Jun 09 '14 at 11:30
  • 1
    Doesn't work still. `TypeError: must be a unicode character, not bytes` – Frozen Flame Jun 10 '14 at 08:45
  • On a Windows system using Anaconda Python 3.4, it requires encoding="bytes" to read the file. Accessing the arrays takes b'' instead of just 'name'. On Linux, using 3.4, I need to use encoding='latin1' and just '' to access each of the arrays. – hknust Oct 09 '14 at 02:15
  • It's almost a year later and this still seems to be a problem, at least on OS X — is that the case for others as well? I'd really prefer to not change the original numpy code for the sake of loading some (but not all) of my .npy files (namely those that I saved before mostly migrating to python3). This seems like something that really should be brought up as a bug to the numpy developer community. – mpacer Apr 30 '15 at 22:51
6

In python 3.5 with numpy 1.10.4, using the following command works for me ;

D = np.load(file, encoding = 'latin1')

It fails with the same error message when I don't specify the encoding.

PhABC
  • 1,583
  • 1
  • 13
  • 19
2

One workaround which helped me is to dump the numpy array loaded in python2.* to a csv file and then read it back in python3.*

# Dump in python2
import numpy as np

datafile = matplotlib.cbook.get_sample_data('goog.npy', asfileobj=False)
arr = np.load(datafile)
np.savetxt("np_arr.csv", arr, delimiter=",")

Now read the file back in python3

# Load in python3
import numpy as np
arr = np.loadtxt(open("np_arr.csv"), delimiter=",")
user1683894
  • 405
  • 1
  • 6
  • 20