Why InfMnist (MNIST) size of 8M examples is calculated as 8 109 999 examples?

Question

On the website : http://leon.bottou.org/projects/infimnist

It says :

Generating files containing the MNIST8M training set: $ infimnist lab 10000 8109999 > mnist8m-labels-idx1-ubyte $ infimnist pat 10000 8109999 > mnist8m-patterns-idx3-ubyte

However, i fail to see why its from 10 000 to 8 109 999 Even if i do : 8 109 999 - 10 000 , it still doesnt make sense to me.

To me 8M would be 8 000 000 + 9 999 because i would end at 9 999 and start from 10 000 to 8 009 999 , which would be 8 million images.

Does anyone understand why its calculated as 8 109 999 ?

I really would like to understand why the -2. – KenobiBastila Apr 22 '16 at 00:56 — KenobiBastila, Apr 22 '16 at 00:56

score 0 · Accepted Answer · answered Dec 12 '15 at 15:42

According to a fellow kaggle user, this is why :

The 8M dataset is the original images + 134 distortions/original. So there are

135*60,000 = 8,100,000

training images.

Adding the 10,000 test images you get 8,110,000 images.

The test images are from index 0 to 10,000-1=9,999 and the training images are from index 10,000 to 8,110,000-1 = 8,109,999.

I hope this helps.

The original dataset is also here:

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html

You can see that "# of data: 8,100,000"

Why InfMnist (MNIST) size of 8M examples is calculated as 8 109 999 examples?

1 Answers1