3

I'm trying to make my classification process a bit faster. I thought of increasing the first input_dim in my deploy.prototxt but that does not seem to work. It's even a little bit slower than classifying each image one by one.

deploy.prototxt

input: "data"  
input_dim: 128  
input_dim: 1  
input_dim: 120  
input_dim: 160  
... net description ...

python net initialization

net=caffe.Net( 'deploy.prototxt', 'model.caffemodel', caffe.TEST)
net.blobs['data'].reshape(128, 1, 120, 160)
transformer = caffe.io.Transformer({'data':net.blobs['data'].data.shape})
#transformer settings

python classification

images=[None]*128
for i in range(len(images)):
  images[i]=caffe.io.load_image('image_path', False)
for j in range(len(images)):
  net.blobs['data'].data[j,:,:,:] = transformer.preprocess('data',images[j])
out = net.forward()['prob']

I skipped some details, but the important stuff should be given. I tried different batch size, like 32, 64, ..., 1024 but all nearly the same. So my question is, if someone has an idea what I'm doing wrong or what needs to be changed? Thanks for help!

EDIT:
Some timing results, the avg-times are just the total-times devided by the processed images(1044).

Batch size: 1

2016-05-04 10:51:20,721 - detector - INFO - data shape: (1, 1, 120, 160)
2016-05-04 10:51:35,149 - main - INFO - GPU timings:
2016-05-04 10:51:35,149 - main - INFO - processed images: 1044
2016-05-04 10:51:35,149 - main - INFO - total-time: 14.43s
2016-05-04 10:51:35,149 - main - INFO - avg-time: 13.82ms
2016-05-04 10:51:35,149 - main - INFO - load-time: 8.31s
2016-05-04 10:51:35,149 - main - INFO - avg-load-time: 7.96ms
2016-05-04 10:51:35,149 - main - INFO - classify-time: 5.99s
2016-05-04 10:51:35,149 - main - INFO - avg-classify-time: 5.74ms

Batch size: 32

2016-05-04 10:52:30,773 - detector - INFO - data shape: (32, 1, 120, 160)
2016-05-04 10:52:45,135 - main - INFO - GPU timings:
2016-05-04 10:52:45,135 - main - INFO - processed images: 1044
2016-05-04 10:52:45,135 - main - INFO - total-time: 14.36s
2016-05-04 10:52:45,136 - main - INFO - avg-time: 13.76ms
2016-05-04 10:52:45,136 - main - INFO - load-time: 7.13s
2016-05-04 10:52:45,136 - main - INFO - avg-load-time: 6.83ms
2016-05-04 10:52:45,136 - main - INFO - classify-time: 7.13s
2016-05-04 10:52:45,136 - main - INFO - avg-classify-time: 6.83ms

Batch size: 128

2016-05-04 10:53:17,478 - detector - INFO - data shape: (128, 1, 120, 160)
2016-05-04 10:53:31,299 - main - INFO - GPU timings:
2016-05-04 10:53:31,299 - main - INFO - processed images: 1044
2016-05-04 10:53:31,299 - main - INFO - total-time: 13.82s
2016-05-04 10:53:31,299 - main - INFO - avg-time: 13.24ms
2016-05-04 10:53:31,299 - main - INFO - load-time: 7.06s
2016-05-04 10:53:31,299 - main - INFO - avg-load-time: 6.77ms
2016-05-04 10:53:31,299 - main - INFO - classify-time: 6.66s
2016-05-04 10:53:31,299 - main - INFO - avg-classify-time: 6.38ms

Batch size: 1024

2016-05-04 10:54:11,546 - detector - INFO - data shape: (1024, 1, 120, 160)
2016-05-04 10:54:25,316 - main - INFO - GPU timings:
2016-05-04 10:54:25,316 - main - INFO - processed images: 1044
2016-05-04 10:54:25,316 - main - INFO - total-time: 13.77s
2016-05-04 10:54:25,316 - main - INFO - avg-time: 13.19ms
2016-05-04 10:54:25,316 - main - INFO - load-time: 7.04s
2016-05-04 10:54:25,316 - main - INFO - avg-load-time: 6.75ms
2016-05-04 10:54:25,316 - main - INFO - classify-time: 6.63s
2016-05-04 10:54:25,316 - main - INFO - avg-classify-time: 6.35ms

Shai
  • 111,146
  • 38
  • 238
  • 371
Feuerteufel
  • 571
  • 5
  • 16
  • are you using GPU or CPU? – Shai May 03 '16 at 12:19
  • I'm using GPU: nvidia GTX980 Ti – Feuerteufel May 03 '16 at 12:23
  • 2
    what do you mean by "all nearly the same"? the runtime of `net.forward()` is the same regardless of `batch_size`, or the runtime *divided* by `batch_size` is "nearly the same"? can you put some numbers here? – Shai May 03 '16 at 12:38
  • After a specific batch size, say 32, the forward time will become almost constant. This is happen as the GPU is fully utilized. If you keep increasing, you will reach a point where you fall short of memory requirement. It is much better to post results. – Qazi May 04 '16 at 06:45
  • I added some timing results. – Feuerteufel May 04 '16 at 09:02
  • It seems nobody really know how to handle that in python. I thought of making a caffe database with 'convert_imageset' tool but I don't know how to use that with finer control than the train tool. I need access to the output for every single image. But I don't know how to start single batches with python. – Feuerteufel May 31 '16 at 09:23

1 Answers1

0

I'm pretty sure the problem is in line

for j in range(len(images)):
net.blobs['data'].data[j,:,:,:] =   transformer.preprocess('data',images[j])
out = net.forward()['prob']

Doing this will simply set the single image data from the last iteration of the for loop as the network's only input. Try stacking the N images (say stackedimages) beforehand and calling the line only once e.g

for j in range(len(images)):
stackedimages <- transformer.preprocess('data',images[j])

Then call,

net.blobs['data'].data[...] =   stackedimages
Prophecies
  • 723
  • 1
  • 7
  • 19
  • I don't think this is an issue. at each iteration a different slice of `['data'].data` is set, and the `forward()` is called only after all slices were assigned. – Shai May 04 '16 at 05:21
  • 2
    I'm with @Shai. In `['data'].data` each index j should have the appropriatly transformed image at the end of the for loop. Nevertheless I have a question: what is the needed data structure for `stackedimages`? Atm all the images are in a list, but a change would be possible. Or would `net.blobs['data'].data[...] = images` be possible? – Feuerteufel May 04 '16 at 08:43
  • Ahh yes, I see. you are right. `net.blobs['data'].data[...] = images` would work fine if the `images` are all already transformed. Just make sure they are appropriately stacked to form the 4d blob of the same dimensions – Prophecies May 04 '16 at 15:24