I would like to speedup the forward pass of classification of a CNN using caffe.
I have tried batch classification in Caffe using code provided in here: Modifying the Caffe C++ prediction code for multiple inputs This solution enables me to give a vector of Mat, but it does not speed up anything. Even though the input layer is modified.
I am processing pretty small images (3x64x64) on a powerful pc with two GTX1080, and there is no issue in terms of memory. I tried also changing the deploy.prototxt, but I get the same result.
It seems that at one point the forward pass of the CNN becomes sequential. I have seen someone pointing this out here also: Batch processing mode in Caffe - no performance gains
Another similar thread, for python : batch size does not work for caffe with deploy.prototxt
I have seen some things about MemoryDataLayer, but I am not sure this will solve my problem.
So I am kind of lost on what to do exactly... does anyone have any information on how to speedup classification time. Thanks for any help !