1

I'm trying to import the MNIST dataset in Python as follows:

import h5py
f = h5py.File("mnist.h5")
x_test = f["x_test"]
x_train = f["x_train"]
y_test = f["y_test"]
y_train = f["y_train"]

the type of say, y_train says h5py._hl.dataset.Dataset

I want to convert them to float for mathematical convenience. I try this:

D = x_train.astype(float)
y_train = y_train.astype(float)+np.ones((60000,1));

but I get this traceback:

Traceback (most recent call last):

  File "<ipython-input-14-f3677d523d45>", line 1, in <module>
    y_train = y_train.astype(float)+np.ones((60000,1));

TypeError: unsupported operand type(s) for +: 'AstypeContext' and 'float'

Where am I missing out? Thanks.

srkdb
  • 775
  • 3
  • 15
  • 28
  • As a side note: that error is unusual and reasonably interesting to investigate, but your title refers to the fact that some code works in MATLAB. Then I looked at your previous questions and they're almost all `v/s MATLAB` in the title. They're different languages. There's no reason why someone who knows how this works in MATLAB will know the direct equivalent in Python. Please consider more descriptive titles about what the issue is you're facing _within python_ and drop the MATLAB reference because your problem simply doesn't exist in that language. – roganjosh Jun 22 '18 at 16:32
  • @roganjosh: Thanks for the tip. I'm reasonably familiar with MATLAB and trying to translate some code into Python. Hence, the dilemma. I'll drop the MATLAB reference now onwards. I've edited the question to reflect this and I'm looking forward to your inputs. – srkdb Jun 22 '18 at 18:15
  • Did you try separating this into two lines: `y_train = y_train.astype(float)` and then `y_train = y_train+np.ones((60000,1));` It at least shouldn't give the same error. – Zev Jun 22 '18 at 18:19
  • @Zev: I did. It still gives the same error. Surprisingly, after the import, I can't even see the variables in my Variable Explorer (Spyder IDE) – srkdb Jun 22 '18 at 18:31

1 Answers1

1

You are using two different libraries that have two completely different meanings for astype.

If you were doing this in numpy, something like this works:

a = np.array([1, 2, 3])

a = a.astype(float) + np.ones((60000,1))

But in h5py, astype is a different function and meant to be used in a context manager:

This will throw the same error as what you are getting:

import h5py
f = h5py.File('mytestfile.hdf5', 'w')
dset = f.create_dataset("default", (100,))
dset.astype(float)  + np.ones((60000,1))

But the code below, will work (see astype in h5py docs):

f = h5py.File('mytestfile.hdf5', 'w')
dset = f.create_dataset("default", (100,))

with dset.astype('float'):
    out = dset[:]
    out += np.ones((100,))

This problem is similar to Creating reference to HDF dataset in H5py using astype

Zev
  • 3,423
  • 1
  • 20
  • 41