Now I am making the leveldb to train caffe framework.So I use "convert_imageset.cpp". This cpp file writes the char-type data only to leveldb. But I have the float data to write it to leveldb. This data is pre-proceed image data so it is float type data. how can I write or convert this float data to leveldb. This float data is a set of vector with 4096 dimensions. Please help me. Or not how to convert it to HDF5Data?
-
why don't you use `"HDF5Data"`? – Shai Jan 04 '16 at 13:15
-
whaat is "HDF5Data"? and how to convert to them? – guochan zhang Jan 04 '16 at 14:47
-
how to use it? i have never been used it.Please post me how to use it. – guochan zhang Jan 04 '16 at 15:03
-
You can see a [python](http://stackoverflow.com/a/31808324/1714410) example on how to prepare a dataset in hdf5 format for caffe, and how to set the appropriate data layer. – Shai Jan 04 '16 at 17:58
2 Answers
HDF5 stands for hierarchical data format. You can manipulate such data format for example with R
(RHDF5 documentation)
Other software that can process HDF5 are Matlab
and Mathematica
.
EDIT
A new set of tools called HDFql
has been recently released to simplify "managing HDF files through a high-level language like C/C++". You can check it out here

- 1,760
- 4
- 25
- 53
-
In fact, I found the code like [this](https://github.com/s9xie/DSN/blob/master/cifar-float-extra/convert_cifar_float_data.cpp). But it has part that I don't understand. what is datum.add_float_data(0) and datum.set_float_data(...). and what is difference between label and key? – guochan zhang Jan 05 '16 at 15:03
def del_and_create(dname):
if os.path.exists(dname):
shutil.rmtree(dname)
os.makedirs(dname)
def get_img_datum(image_fn):
img = cv.imread(image_fn, cv.IMREAD_COLOR)
img = img.swapaxes(0, 2).swapaxes(1, 2)
datum = caffe.io.array_to_datum(img, 0)
return datum
def get_jnt_datum(joint_fn):
joint = np.load(joint_fn)
datum = caffe.io.caffe_pb2.Datum()
datum.channels = len(joint)
datum.height = 1
datum.width = 1
datum.float_data.extend(joint.tolist())
return datum
def create_dataset():
img_db_fn = 'img.lmdb'
del_and_create(img_db_fn)
img_env = lmdb.Environment(img_db_fn, map_size=1099511627776)
img_txn = img_env.begin(write=True, buffers=True)
jnt_db_fn = 'joint.lmdb'
del_and_create(jnt_db_fn)
jnt_env = lmdb.Environment(jnt_db_fn, map_size=1099511627776)
jnt_txn = jnt_env.begin(write=True, buffers=True)
img_fns = glob.glob('imageData/*.jpg')
fileCount = len(img_fns)
print 'A total of ', fileCount, ' images.'
jnt_fns = glob.glob('jointData/*.npy')
jointCount = len(jnt_fns)
if(fileCount != jointCount):
print 'The file counts doesnot match'
exit()
keys = np.arange(fileCount)
np.random.shuffle(keys)
for i, (img_fn, jnt_fn) in enumerate( zip(sorted(img_fns), sorted(jnt_fns)) ):
img_datum = get_img_datum(img_fn)
jnt_datum = get_jnt_datum(jnt_fn)
key = '%010d' % keys[i]
img_txn.put(key, img_datum.SerializeToString())
jnt_txn.put(key, jnt_datum.SerializeToString())
if i % 10000 == 0:
img_txn.commit()
jnt_txn.commit()
jnt_txn = jnt_env.begin(write=True, buffers=True)
img_txn = img_env.begin(write=True, buffers=True)
print '%d'%(i), os.path.basename(img_fn), os.path.basename(jnt_fn)
img_txn.commit()
jnt_txn.commit()
img_env.close()
jnt_env.close()
The above code expects images from a given path, and the labels of each image as .npy file.
Credits: https://github.com/mitmul/deeppose/blob/caffe/scripts/dataset.py
Note: I had seen Shai's answer to a question, which claims that lmdb doesnot support float-type data. But, it does work for me with the latest version of Caffe and LMDB and using this code snippet. As his answer is way too old, its highly likely that older versions may not have supported float-type data.

- 1
- 1

- 5,417
- 2
- 26
- 43
-
1
-
lol! :D Infact your reply was just 5 months old, but with the current pace in deeplearning, even that figure has to be termed 'too old'.. thats hard but great! :) – Anoop K. Prabhu Jan 05 '16 at 11:50
-
Thank you, but i don't know python. So I can not understand python and am using c++. Do you have a c++ code instead of python. I need it. – guochan zhang Jan 05 '16 at 14:49
-
I was using C++ till a month back, and I had absolutely no idea in Python. Now I do almost all the preprocessing in python and feed to lmdb or hdf5. The above code is almost a plug n play type. Afterall, SO is not for spoonfeeding. SO's intention is to give inputs whenever a programmer gets stuck :) – Anoop K. Prabhu Jan 05 '16 at 14:54
-
Thank you, but i don't know python. So I can not understand python and am using c++. Do you have a c++ code instead of python. I need it. In fact, I found the code like [this](https://github.com/s9xie/DSN/blob/master/cifar-float-extra/convert_cifar_float_data.cpp). But it has part that I don't understand. what is datum.add_float_data(0) and datum.set_float_data(...). and what is difference between label and key? – guochan zhang Jan 05 '16 at 14:59