-2

On the website : http://leon.bottou.org/projects/infimnist

It says :

Generating files containing the MNIST8M training set: $ infimnist lab 10000 8109999 > mnist8m-labels-idx1-ubyte $ infimnist pat 10000 8109999 > mnist8m-patterns-idx3-ubyte

However, i fail to see why its from 10 000 to 8 109 999 Even if i do : 8 109 999 - 10 000 , it still doesnt make sense to me.

To me 8M would be 8 000 000 + 9 999 because i would end at 9 999 and start from 10 000 to 8 009 999 , which would be 8 million images.

Does anyone understand why its calculated as 8 109 999 ?

KenobiBastila
  • 539
  • 4
  • 16
  • 52

1 Answers1

0

According to a fellow kaggle user, this is why :

The 8M dataset is the original images + 134 distortions/original. So there are

135*60,000 = 8,100,000

training images.

Adding the 10,000 test images you get 8,110,000 images.

The test images are from index 0 to 10,000-1=9,999 and the training images are from index 10,000 to 8,110,000-1 = 8,109,999.

I hope this helps.

The original dataset is also here:

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html

You can see that "# of data: 8,100,000"

KenobiBastila
  • 539
  • 4
  • 16
  • 52