Disclamer: I don't know almost nothing on CNNs and I have no idea where I could ask this.
My research is focused on high performance on computer vision applications. We generate codes representing an image in less than 20 ms on images with the largest size of 500pxs.
This is done by combining SURF descriptors and VLAD codes, obtaining a vector representing an image that will be used in our object recognition application.
Can CNNs be faster? According to this benchmark (which is based on much smaller images) the times needed is longer, almost double considering that the size of the image is half of ours.