Can CNNs be faster than classic descriptors?

Question

Disclamer: I don't know almost nothing on CNNs and I have no idea where I could ask this.

My research is focused on high performance on computer vision applications. We generate codes representing an image in less than 20 ms on images with the largest size of 500pxs.

This is done by combining SURF descriptors and VLAD codes, obtaining a vector representing an image that will be used in our object recognition application.

Can CNNs be faster? According to this benchmark (which is based on much smaller images) the times needed is longer, almost double considering that the size of the image is half of ours.

score 2 · Answer 1 · answered Apr 02 '17 at 20:32

2

Yes, they can be faster. The numbers you got are for networks trained for ImageNet classification, 1 Million images, 1000 classes. Unless your classification problem is similar, then using a ImageNet network is overkill.

You should also remember that these networks have weights in the order of 10-100 million, so they are quite expensive to evaluate. But you probably don't need a really big network, and you can design your own network, with less layers and parameters that is much cheaper to evaluate.

In my experience, I designed a network to classify 96x96 sonar image patches, and with around 4000 weights in total, it can get over 95% classification accuracy and run at 40 ms per frame on a RPi2.

A bigger network with 900K weights, same input size, takes 7 ms to evaluate on a Core i7. So this is surely possible, you just need to play with smaller network architectures. A good start is SqueezeNet, which is a network that can achieve good performance in Imagenet, but has 50 times less weights, and it is of course much faster than other networks.

answered Apr 02 '17 at 20:32

Dr. Snoopy

55,122
7
121
140

Thanks for your answer. Sorry if I'm going to say something wrong (as I said I know pretty much nothing about CNNs), but from my understanding it's the classic precision/speed tradeoff, is that correct? And the times that you are reporting are the whole computation needed, from the image query to the produced code, right? Or there is also some other time to consider? Anyway we are using this for Object Recognition and eventually Image Retrieval, so it is not classification. – justHelloWorld Apr 02 '17 at 20:38
@justHelloWorld Object recognition is pretty much the same as Image classification. It is a speed/accuracy tradeoff, but its more like you lose 0.5% accuracy and gain 40x speedup. The time I report is for one a forward pass, and could be less if you only need the features. – Dr. Snoopy Apr 02 '17 at 20:43
Thanks for your answer again. We are trying to implement a Cache for this kind of applications, so performance are crucial in this. An important feature is that our proposed cache approach is based on metric spaces, so it should be very important that the distance to evaluate the similarity between two CNN-codes is based on metric distances. Is that possible in your opinion? PS: seriously, thanks for all the help :D – justHelloWorld Apr 02 '17 at 20:47
For this reason we were using VLAD codes: they are very compact (so it's very fast to compute the similarity between them), they are based on classic descritpors, they are very fast to generate and they use the L2 distance. – justHelloWorld Apr 02 '17 at 20:49
@justHelloWorld Yes, there are CNNs that do exactly that, learn an embedding with meaningful distances. See FaceNet for example. This is just part of Metric learning. – Dr. Snoopy Apr 02 '17 at 20:50
Another very important reason is that in order to use a cache hit or cache miss we need some threshold value (if the distance with the closest cached image is less than the threshold, than there is a cache hit, otherwise it's a cache miss). This is usually very tricky for VLAD codes, because the distance is query-dependent and so it's not obvious to find a meaningful treshold. Luckily, some work has been proposed on that. Is that a problem in CNNs to find a treshold? – justHelloWorld Apr 02 '17 at 20:55
Thanks so much :) – justHelloWorld Apr 02 '17 at 20:57
Could you please give a look at [this](https://stackoverflow.com/questions/44684463/how-to-interpret-this-cnn-benchmarks) question? – justHelloWorld Jun 25 '17 at 13:43

score 1 · Answer 2 · answered Apr 02 '17 at 13:21

1

I would be wary of benchmarks and blanket statements. It's important to know every detail that went into generating the quoted values. For example, would running CNN on GPU hardware improve the quoted values?

20ms seems very fast to me; so does 40ms. I have no idea what your requirement is.

What other benefits could CNN offer? Maybe it's more than just raw speed.

I don't believe that neural networks are the perfect technique for every problem. Regression, SVM, and other classification techniques are still viable.

There's a bias at work here. Your question reads as if you are looking only to confirm that your current research is best. You have a sunk cost that you're loath to throw away, but you're worried that there might be something better out there. If that's true, I don't think this is a good question for SO.

"I don't know almost nothing on CNNs" - if you're a true researcher, seeking the truth, I think you have an obligation to learn and answer for yourself. TensorFlow and Keras make this easy to do.

answered Apr 02 '17 at 13:21

duffymo

305,152
44
369
561

Thanks for your useful answer. If you follow the link, you can see that these tests are performed using high-end GPUs. – justHelloWorld Apr 02 '17 at 13:31
I totally agree with you about it, but this is a master thesis research project and learning CNNs would require more than the research itself. – justHelloWorld Apr 02 '17 at 13:32
I see your point. A masters thesis need not reach the standard of a doctoral dissertation for originality and scope. I think you're safe with the work you've done. I disagree with your assessment of CNN. What I see of TensorFlow and Keras would bring it within reach. The best answer will come from your adviser. I'd suggest a meeting. – duffymo Apr 02 '17 at 13:42
Thanks so much. It's strange how few words from someone you met on stack overflow can be much more reassuring than someone you know well in person. – justHelloWorld Apr 02 '17 at 13:49
1

I'm glad to help. Good luck. Your work sounds very interesting. The good news for you is that learning should not nor need not stop once your degree is awarded. There's nothing to stop you from diving into CNN once your degree requirements are fulfilled. – duffymo Apr 02 '17 at 13:57

Martin Thoma · Answer 3 · 2017-04-03T07:54:35.247

Answer to your question: Yes, they can. They can be slower and they can be faster than classic descriptors. For example, using only a single filter and several max-poolings will almost certainly be faster. But the results will also certainly be crappy.

You should ask a much more specific question. Relevant parts are:

Problem: Classification / Detection / Semantic Segmentation / Instance Segmentation / Face verification / ... ?
Constraints: Minimum accuracy / maximum speed / maximum latency?
Evaluation specifics:
- Which hardware is available (GPUs)?
- Do you evaluate on a single image? Often you can evaluate up to 512 images in about the same time as one image.

Also: The input image size should not be relevant. If CNNs achieve better results on smaller inputs than classic descriptors, why should you care?

Papers

Please note that CNNs are usually not tweaked towards speed, but towards accuracy.

Detection: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks: 600px x ~800px in 200ms on a GPU
InverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single Image: 9.79ms with GeForce GTX Titan and AlexNet to get FC7 features
Semantic segmentation: Pixel-wise Segmentation of Street with Neural Networks 20ms with GeForce GTX 980

That's a hell of an answer! Thank you so much! I really needed some paper as reference! In the benchmark that I linked it's written that they images are 224x224 for each considered case. They use GPUs to evaluate the speed. As I already said in my previous comments, speed is crucial in my application. — justHelloWorld, Apr 03 '17 at 08:18

Can CNNs be faster than classic descriptors?

3 Answers3

Papers