0

Now I am making the leveldb to train caffe framework.So I use "convert_imageset.cpp". This cpp file writes the char-type data only to leveldb. But I have the float data to write it to leveldb. This data is pre-proceed image data so it is float type data. how can I write or convert this float data to leveldb. This float data is a set of vector with 4096 dimensions. Please help me. Or not how to convert it to HDF5Data?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130

2 Answers2

2

HDF5 stands for hierarchical data format. You can manipulate such data format for example with R (RHDF5 documentation)

Other software that can process HDF5 are Matlab and Mathematica.

EDIT

A new set of tools called HDFql has been recently released to simplify "managing HDF files through a high-level language like C/C++". You can check it out here

Manfredo
  • 1,760
  • 4
  • 25
  • 53
  • In fact, I found the code like [this](https://github.com/s9xie/DSN/blob/master/cifar-float-extra/convert_cifar_float_data.cpp). But it has part that I don't understand. what is datum.add_float_data(0) and datum.set_float_data(...). and what is difference between label and key? – guochan zhang Jan 05 '16 at 15:03
1
def del_and_create(dname):
    if os.path.exists(dname):
        shutil.rmtree(dname)
    os.makedirs(dname)

def get_img_datum(image_fn):
    img = cv.imread(image_fn, cv.IMREAD_COLOR)
    img = img.swapaxes(0, 2).swapaxes(1, 2)
    datum = caffe.io.array_to_datum(img, 0)
    return datum

def get_jnt_datum(joint_fn):
    joint = np.load(joint_fn)
    datum = caffe.io.caffe_pb2.Datum()
    datum.channels = len(joint)
    datum.height = 1
    datum.width = 1
    datum.float_data.extend(joint.tolist())

    return datum

def create_dataset():
    img_db_fn = 'img.lmdb'
    del_and_create(img_db_fn)
    img_env = lmdb.Environment(img_db_fn, map_size=1099511627776)
    img_txn = img_env.begin(write=True, buffers=True)

    jnt_db_fn = 'joint.lmdb'
    del_and_create(jnt_db_fn)
    jnt_env = lmdb.Environment(jnt_db_fn, map_size=1099511627776)
    jnt_txn = jnt_env.begin(write=True, buffers=True)

    img_fns = glob.glob('imageData/*.jpg')
    fileCount = len(img_fns)
    print 'A total of ', fileCount, ' images.'
    jnt_fns = glob.glob('jointData/*.npy')
    jointCount = len(jnt_fns)
    if(fileCount != jointCount):
        print 'The file counts doesnot match'
        exit()

    keys = np.arange(fileCount)
    np.random.shuffle(keys)

    for i, (img_fn, jnt_fn) in enumerate( zip(sorted(img_fns), sorted(jnt_fns)) ):
        img_datum = get_img_datum(img_fn)
        jnt_datum = get_jnt_datum(jnt_fn)
        key = '%010d' % keys[i]

        img_txn.put(key, img_datum.SerializeToString())
        jnt_txn.put(key, jnt_datum.SerializeToString())

        if i % 10000 == 0:
            img_txn.commit()
            jnt_txn.commit()
            jnt_txn = jnt_env.begin(write=True, buffers=True)
            img_txn = img_env.begin(write=True, buffers=True)

        print '%d'%(i), os.path.basename(img_fn), os.path.basename(jnt_fn)

    img_txn.commit()
    jnt_txn.commit()
    img_env.close()
    jnt_env.close()

The above code expects images from a given path, and the labels of each image as .npy file.

Credits: https://github.com/mitmul/deeppose/blob/caffe/scripts/dataset.py

Note: I had seen Shai's answer to a question, which claims that lmdb doesnot support float-type data. But, it does work for me with the latest version of Caffe and LMDB and using this code snippet. As his answer is way too old, its highly likely that older versions may not have supported float-type data.

Community
  • 1
  • 1
Anoop K. Prabhu
  • 5,417
  • 2
  • 26
  • 43
  • 1
    You call me "old" !? ;) – Shai Jan 05 '16 at 09:55
  • lol! :D Infact your reply was just 5 months old, but with the current pace in deeplearning, even that figure has to be termed 'too old'.. thats hard but great! :) – Anoop K. Prabhu Jan 05 '16 at 11:50
  • Thank you, but i don't know python. So I can not understand python and am using c++. Do you have a c++ code instead of python. I need it. – guochan zhang Jan 05 '16 at 14:49
  • I was using C++ till a month back, and I had absolutely no idea in Python. Now I do almost all the preprocessing in python and feed to lmdb or hdf5. The above code is almost a plug n play type. Afterall, SO is not for spoonfeeding. SO's intention is to give inputs whenever a programmer gets stuck :) – Anoop K. Prabhu Jan 05 '16 at 14:54
  • Thank you, but i don't know python. So I can not understand python and am using c++. Do you have a c++ code instead of python. I need it. In fact, I found the code like [this](https://github.com/s9xie/DSN/blob/master/cifar-float-extra/convert_cifar_float_data.cpp). But it has part that I don't understand. what is datum.add_float_data(0) and datum.set_float_data(...). and what is difference between label and key? – guochan zhang Jan 05 '16 at 14:59