1

I cannot download the Keras MNIST db the simple way due to proxy.

So I have downloaded a local version from here : https://s3.amazonaws.com/img-datasets/mnist.pkl.gz

I am importing that to my notebook with the following code :

import gzip
import pickle
f = gzip.open('mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
    data = pickle.load(f)
else:
    data = pickle.load(f, encoding='bytes')
f.close()
print(data)
(X_train, y_train), (X_test, y_test) = data

but I'm not really sure how to play with it.

I am trying to print the shapes like so :

print(X_train.shape)
print(y_train.shape)

but this is giving the output :

(60000, 28, 28)
(60000,)

which doesn't really make sense to me. What does this actually mean? How can I print it more meaningfully?

Simon Kiely
  • 5,880
  • 28
  • 94
  • 180
  • 1
    Why does it not make sense to you? What exactly are you expecting? – Dr. Snoopy Jun 06 '19 at 11:52
  • @MatiasValdenegro Hi Matias, thanks for the response. I was just wondering what each numpy array holds and how this data is stored/how I can show it meaningfully. I am kind of confused by the need for 4 arrays. – Simon Kiely Jun 06 '19 at 12:35

1 Answers1

1

The shape of your X_train means that you have 60.000 exemples of shape (28, 28), so basicly 60 000 images of size 28 by 28, and black and white because you don't have a third channel.

For your y_train that means that you have 60.000 labels, so one label for each corresponding image.

If you want to print an image to see what it's look like you can do this :
(here the first image)

plt.imshow(X_train[0, :, :], 'gray')
plt.title("image label: "+ str(y_train[0]), fontsize=14)

Is that more clear for you ?

Thibault Bacqueyrisses
  • 2,281
  • 1
  • 6
  • 18
  • That is great, thank you. The only thing that confuses me is the labels - the image at position [0] in X_train, is the label for this in y_train[0], yes? – Simon Kiely Jun 06 '19 at 12:32
  • 1
    Actually the 'label' represent, in that situation, 'what number is on the image', So if the image at X_train[0] represent a '5', the label at the position y_train[0] will be a '5'. y_train contains the labels, X_train contains the images – Thibault Bacqueyrisses Jun 06 '19 at 12:40