How to re-open LMDB after some iteration?

Question

I have dataset contain 71,000 images with 500MB size, and I'm trying to put it in LMDB for Caffe framework. I've used this code :

with lmdb.open(train_lmdb, map_size=int(1e12)) as env:
    with env.begin(write=True) as in_txn:
        for in_idx, img_path in enumerate(train_data):
           ... ... ...
           datum = make_datum(img, label)
           in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
    env.close()

train_data is the list of image files.

But the problem is after some times, my 8G memory get full and Ubuntu freezes.

I've read that we should close and reopen it, but it (as i understand) put all data in DB when it read them all(database file size don't change till the end). and my memory is way bigger than dataset.

My questions are : Is there any way to reopen db and save current state to resume it in reopening? Is there any way to tel LMDB to put every tuple in db after reading it one by one not all in same time? Why it needs so much memory ?

PS : When i put this in the for loop :

if counter%1000 == 0 :
        env.close()
        env = lmdb.open(train_lmdb, map_size=int(1e12))
        in_txn = env.begin(write=True)

It goes to the end but database file is empty !!

PS2 : I've tried to pass writemap=True to lmdb.open it grows the size of lmdb to 1TB !! but it still have no data in it !!

You have to `commit` at some point, otherwise you won't write to the database. — , Aug 26 '17 at 11:58
@thigi , unfortunately no. I switched to give images directly to Caffe which is too slow but it did the job. — malloc, Aug 27 '17 at 17:26
Well you just have to add `env.commit()` before closing the database then it should work — , Aug 27 '17 at 17:28
@thigi , Actually i think i did that, it put dirty records, many of them replicated so i gave up on that. sorry it was long time ago i hardly remember but i will try it again soon. — malloc, Aug 27 '17 at 17:40

How to re-open LMDB after some iteration?

0 Answers0