Converting mnist data to lmdb with python results very large database

Question

I am currently playing the lenet model provided by caffe.

the example (which is in path/to/caffe/examples/mnist/convert_mnist_data.cpp provides a c++ program to convert the mnist data to lmdb.

I write a python program to do the same thing, but the size (480MB) of lmdb is much larger than the one converted by c++ (60MB).

the test accuracy is almost the same (98%).

I want to know why the size differs a lot.

Here is the program. I utilize the mnist module (https://pypi.python.org/pypi/python-mnist/) to help load the binary mnist data.

from mnist import MNIST
import numpy as np
import cv2
import lmdb
import caffe
mndata = MNIST('./data')
images, labels = mndata.load_training()
labels = np.array(labels)
images = np.array(images).reshape(len(labels), 28, 28).astype(np.uint8)

print type(images[0][0][0])

count = 0
env = lmdb.open('mnist_lmdb', map_size=1000*1000*1000)

txn = env.begin(write=True)
for i in xrange(len(labels)):
    print i
    datum = caffe.proto.caffe_pb2.Datum()
    datum.channels = 1
    datum.height = 28
    datum.width = 28
    datum.data = images[i].tobytes()
    datum.label = labels[i]
    str_id = '{:08}'.format(i)
    txn.put(str_id, datum.SerializeToString())

    count = count + 1

    if count % 1000 == 0:
        txn.commit()
        txn = env.begin(write=True)

if count % 1000 != 0:
    txn.commit()
env.close()

thank you.

score 0 · Answer 1 · edited Aug 12 '16 at 04:30

0

env = lmdb.open('mnist_lmdb', map_size=1000*1000*1000)

The db size is mainly depend on the map_size,so you can reduce the map_size

edited Aug 12 '16 at 04:30

piyushj

1,546
5
21
29

answered Aug 12 '16 at 03:26

junyu

1

Converting mnist data to lmdb with python results very large database

1 Answers1