I have dataset contain 71,000 images with 500MB size, and I'm trying to put it in LMDB for Caffe framework. I've used this code :
with lmdb.open(train_lmdb, map_size=int(1e12)) as env:
with env.begin(write=True) as in_txn:
for in_idx, img_path in enumerate(train_data):
... ... ...
datum = make_datum(img, label)
in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
env.close()
train_data
is the list of image files.
But the problem is after some times, my 8G memory get full and Ubuntu freezes.
I've read that we should close and reopen it, but it (as i understand) put all data in DB when it read them all(database file size don't change till the end). and my memory is way bigger than dataset.
My questions are : Is there any way to reopen db and save current state to resume it in reopening? Is there any way to tel LMDB to put every tuple in db after reading it one by one not all in same time? Why it needs so much memory ?
PS : When i put this in the for loop :
if counter%1000 == 0 :
env.close()
env = lmdb.open(train_lmdb, map_size=int(1e12))
in_txn = env.begin(write=True)
It goes to the end but database file is empty !!
PS2 : I've tried to pass writemap=True
to lmdb.open
it grows the size of lmdb to 1TB !! but it still have no data in it !!