1

I am building a CBIR application . I am using the features extracted from a deep convnet. The feature vectors are quite big ( about 100,000 in size) . And the dataset has more than 10k images. I have already gone through the answer to this problem, and I don't want to use the libraries mentioned in the same.

I tried cPickle and hdf5 for storing the feature vectors. I am running it on a PC having 4GB of RAM and a 2Ghz Intel Core i3 processor.

The following piece of code builds the index :

h = h5py.File(index_file, 'w')
for imagePath in glob.glob(args["dataset"] + "/*.*"):
    # extract our unique image ID (i.e. the filename)
    k = imagePath[imagePath.rfind('/') + 1:]
    features = get_features(imagePath, args["layer"])
    h.create_dataset(k, data=features)

Whenever I run the program to build the index for my dataset of images, I get the error "Python.exe has stopped working"after around 16MB of index file has been created. I am new to hdf5 and the answer maybe trivial, but any help would be deeply appreciated.

Arko1696
  • 13
  • 4
  • Consider using only a small sample of the data set first, or some dummy data of smaller size. Although it's hard to be use with that information alone, it looks odd that you are creating a data set for each image. Wouldn't you rather insert all features into a single row-indexed data-set? – E_net4 Jan 23 '18 at 16:49
  • Yes I did try with a subset of 130 images from the original. It took around 10 mins to finish building the index file. It worked fine. Also, I am creating a new dataset each time because I need to map the image file name to the feature vector for retrieving later. I am not sure if that is possible within a single dataset with row based indexing. (Maybe keep another dictionary, mapping the file names to the indices). But I am not sure if that might be causing the problem. – Arko1696 Jan 23 '18 at 18:11
  • I find it unlikely that HDF5 is optimized for a large number of datasets. It may be better to keep a separate dictionary and rely on a single dataset for the feature vectors. – E_net4 Jan 23 '18 at 18:32

0 Answers0