How to write or convert float-type data to leveldb in caffe

Question

Now I am making the leveldb to train caffe framework.So I use "convert_imageset.cpp". This cpp file writes the char-type data only to leveldb. But I have the float data to write it to leveldb. This data is pre-proceed image data so it is float type data. how can I write or convert this float data to leveldb. This float data is a set of vector with 4096 dimensions. Please help me. Or not how to convert it to HDF5Data?

how to use it? i have never been used it.Please post me how to use it. — guochan zhang, Jan 04 '16 at 15:03
You can see a [python](http://stackoverflow.com/a/31808324/1714410) example on how to prepare a dataset in hdf5 format for caffe, and how to set the appropriate data layer. — Shai, Jan 04 '16 at 17:58

Manfredo · Accepted Answer · 2016-03-24T15:36:46.927

2

HDF5 stands for hierarchical data format. You can manipulate such data format for example with R (RHDF5 documentation)

Other software that can process HDF5 are Matlab and Mathematica.

EDIT

A new set of tools called HDFql has been recently released to simplify "managing HDF files through a high-level language like C/C++". You can check it out here

edited Mar 24 '16 at 15:36

answered Jan 04 '16 at 16:51

Manfredo

1,760
4
25
53

In fact, I found the code like [this](https://github.com/s9xie/DSN/blob/master/cifar-float-extra/convert_cifar_float_data.cpp). But it has part that I don't understand. what is datum.add_float_data(0) and datum.set_float_data(...). and what is difference between label and key? – guochan zhang Jan 05 '16 at 15:03

score 1 · Answer 2 · edited May 23 '17 at 12:00

def del_and_create(dname):
    if os.path.exists(dname):
        shutil.rmtree(dname)
    os.makedirs(dname)

def get_img_datum(image_fn):
    img = cv.imread(image_fn, cv.IMREAD_COLOR)
    img = img.swapaxes(0, 2).swapaxes(1, 2)
    datum = caffe.io.array_to_datum(img, 0)
    return datum

def get_jnt_datum(joint_fn):
    joint = np.load(joint_fn)
    datum = caffe.io.caffe_pb2.Datum()
    datum.channels = len(joint)
    datum.height = 1
    datum.width = 1
    datum.float_data.extend(joint.tolist())

    return datum

def create_dataset():
    img_db_fn = 'img.lmdb'
    del_and_create(img_db_fn)
    img_env = lmdb.Environment(img_db_fn, map_size=1099511627776)
    img_txn = img_env.begin(write=True, buffers=True)

    jnt_db_fn = 'joint.lmdb'
    del_and_create(jnt_db_fn)
    jnt_env = lmdb.Environment(jnt_db_fn, map_size=1099511627776)
    jnt_txn = jnt_env.begin(write=True, buffers=True)

    img_fns = glob.glob('imageData/*.jpg')
    fileCount = len(img_fns)
    print 'A total of ', fileCount, ' images.'
    jnt_fns = glob.glob('jointData/*.npy')
    jointCount = len(jnt_fns)
    if(fileCount != jointCount):
        print 'The file counts doesnot match'
        exit()

    keys = np.arange(fileCount)
    np.random.shuffle(keys)

    for i, (img_fn, jnt_fn) in enumerate( zip(sorted(img_fns), sorted(jnt_fns)) ):
        img_datum = get_img_datum(img_fn)
        jnt_datum = get_jnt_datum(jnt_fn)
        key = '%010d' % keys[i]

        img_txn.put(key, img_datum.SerializeToString())
        jnt_txn.put(key, jnt_datum.SerializeToString())

        if i % 10000 == 0:
            img_txn.commit()
            jnt_txn.commit()
            jnt_txn = jnt_env.begin(write=True, buffers=True)
            img_txn = img_env.begin(write=True, buffers=True)

        print '%d'%(i), os.path.basename(img_fn), os.path.basename(jnt_fn)

    img_txn.commit()
    jnt_txn.commit()
    img_env.close()
    jnt_env.close()

The above code expects images from a given path, and the labels of each image as .npy file.

Credits: https://github.com/mitmul/deeppose/blob/caffe/scripts/dataset.py

Note: I had seen Shai's answer to a question, which claims that lmdb doesnot support float-type data. But, it does work for me with the latest version of Caffe and LMDB and using this code snippet. As his answer is way too old, its highly likely that older versions may not have supported float-type data.

lol! :D Infact your reply was just 5 months old, but with the current pace in deeplearning, even that figure has to be termed 'too old'.. thats hard but great! :) — Anoop K. Prabhu, Jan 05 '16 at 11:50
Thank you, but i don't know python. So I can not understand python and am using c++. Do you have a c++ code instead of python. I need it. — guochan zhang, Jan 05 '16 at 14:49
I was using C++ till a month back, and I had absolutely no idea in Python. Now I do almost all the preprocessing in python and feed to lmdb or hdf5. The above code is almost a plug n play type. Afterall, SO is not for spoonfeeding. SO's intention is to give inputs whenever a programmer gets stuck :) — Anoop K. Prabhu, Jan 05 '16 at 14:54
Thank you, but i don't know python. So I can not understand python and am using c++. Do you have a c++ code instead of python. I need it. In fact, I found the code like [this](https://github.com/s9xie/DSN/blob/master/cifar-float-extra/convert_cifar_float_data.cpp). But it has part that I don't understand. what is datum.add_float_data(0) and datum.set_float_data(...). and what is difference between label and key? — guochan zhang, Jan 05 '16 at 14:59

How to write or convert float-type data to leveldb in caffe

2 Answers2