GPU based SIFT feature extractor for iOS?

Question

I've been playing with the excellent GPUImage library, which implements several feature detectors: Harris, FAST, ShiTomas, Noble. However none of those implementations help with the feature extraction and matching part. They simply output a set of detected corner points.

My understanding (which is shakey) is that the next step would be to examine each of those detected corner points and extract the feature from then, which would result in descriptor - ie, a 32 or 64 bit number that could be used to index the point near to other, similar points.

From reading Chapter 4.1 of [Computer Vision Algorithms and Applications, Szeliski], I understand that using a BestBin approach would help to efficient find neighbouring feautures to match, etc. However, I don't actually know how to do this and I'm looking for some example code that does this.

I've found this project [https://github.com/Moodstocks/sift-gpu-iphone] which claims to implement as much as possible of the feature extraction in the GPU. I've also seen some discussion that indicates it might generate buggy descriptors.

And in any case, that code doesn't go on to show how the extracted features would be best matched against another image.

My use case if trying to find objects in an image.

Does anyone have any code that does this, or at least a good implementation that shows how the extracted features are matched? I'm hoping not to have to rewrite the whole set of algorithms.

thanks, Rob.

this may help - it's gpu code doing neural net computations (unfortunately probably some swift2 code rot) https://github.com/johndpope/espresso/blob/master/EspressoHostApp/EspressoHostApp/Shaders.metal / https://github.com/codinfox/espresso/blob/master/espresso/espresso/Network.swift — johndpope, Sep 13 '18 at 16:03

score 2 · Accepted Answer · edited Apr 13 '17 at 12:47

First, you need to be careful with SIFT implementations, because the SIFT algorithm is patented and the owners of those patents require license fees for its use. I've intentionally avoided using that algorithm for anything as a result.

Finding good feature detection and extraction methods that also work well on a GPU is a little tricky. The Harris, Shi-Tomasi, and Noble corner detectors in GPUImage are all derivatives of the same base operation, and probably aren't the fastest way to identify features.

As you can tell, my FAST corner detector isn't operational yet. The idea there is to use a lookup texture based on a local binary pattern (why I built that filter first to test the concept), and to have that return whether it's a corner point or not. That should be much faster than the Harris, etc. corner detectors. I also need to finish my histogram pyramid point extractor so that feature extraction isn't done in an extremely slow loop on the GPU.

The use of a lookup texture for a FAST corner detector is inspired by this paper by Jaco Cronje on a technique they refer to as BFROST. In addition to using the quick, texture-based lookup for feature detection, the paper proposes using the binary pattern as a quick descriptor for the feature. There's a little more to it than that, but in general that's what they propose.

Feature matching is done by Hamming distance, but while there are quick CPU-side and CUDA instructions for calculating that, OpenGL ES doesn't have one. A different approach might be required there. Similarly, I don't have a good solution for finding a best match between groups of features beyond something CPU-side, but I haven't thought that far yet.

It is a primary goal of mine to have this in the framework (it's one of the reasons I built it), but I haven't had the time to work on this lately. The above are at least my thoughts on how I would approach this, but I warn you that this will not be easy to implement.

Yes, I've been following your work over on the GPUImage project as well, trolling those tickets as well, so I did find the ticket about implementing FAST. As a next (baby) step, I'm noticing that the current detectors don't extract the features. It looks to me like they need to feed their output into another filter that creates the descriptors and indexes them. At that point, I think I could do the algorithm in the CPU, since I'm dealing only with hundreds of features, not thousands. Do you think that is sensible? — Rob, Mar 04 '14 at 00:09
Have you seen this library? https://www.vuforia.com/platform. Seems to also detect objects in an image, using OpenGL. Apparently free, no fees. — Rob, Mar 04 '14 at 17:40

johndpope · Answer 2 · 2016-12-29T22:25:06.657

For object recognition / these days (as of a couple weeks ago) best to use tensorflow /Convolutional Neural Networks for this. Apple has some metal sample code recently added. https://developer.apple.com/library/content/samplecode/MetalImageRecognition/Introduction/Intro.html#//apple_ref/doc/uid/TP40017385

To do feature detection within an image - I draw your attention to an out of the box - KAZE/AKAZE algorithm with opencv. http://www.robesafe.com/personal/pablo.alcantarilla/kaze.html

For ios, I glued the Akaze class together with another stitching sample to illustrate.

detector = cv::AKAZE::create();
detector->detect(mat, keypoints); // this will find the keypoints

cv::drawKeypoints(mat, keypoints, mat);

// this is the pseudo SIFT descriptor
.. [255] = {
pt = (x = 645.707153, y = 56.4605064)
size = 4.80000019
angle = 0
response = 0.00223364262
octave = 0
class_id = 0 }

https://github.com/johndpope/OpenCVSwiftStitch

ah, you're right. I'm actually working on a machine learning algorithm with a mash up of FAST. From there - GPU accelerations could potentially come out of the box with Core ML + trained model. https://github.com/johndpope/Bresenham-Line — johndpope, Sep 05 '18 at 16:23

score 1 · Answer 3 · answered Jan 24 '23 at 19:25

Here is a GPU accelerated SIFT feature extractor:

https://github.com/lukevanin/SIFTMetal

The code is written in Swift 5 and uses Metal compute shaders for most operations (scaling, gaussian blur, key point detection and interpolation, feature extraction). The implementation is largely based on the paper and code from the "Anatomy of the SIFT Method Article" published in the Image Processing Online Journal (IPOL) in 2014 (http://www.ipol.im/pub/art/2014/82/). Some parts are based on code by Rob Whess (https://github.com/robwhess/opensift), which I believe is now used in OpenCV.

For feature matching I tried using a kd-tree with the best-bin first (BBF) method proposed by David Lowe. While BBF does provide some benefit up to about 10 dimensions, with a higher number of dimensions such as used by SIFT, it is no better than quadratic search due to the "curse of dimensionality". That is to say, if you compare 1000 descriptors against 1000 other descriptors, it stills end up making 1,000 x 1,000 = 1,000,000 comparisons - the same as doing brute-force pairwise.

In the linked code I use a different approach optimised for performance over accuracy. I use a trie to locate the general vicinity for potential neighbours, then search a fixed number of sibling leaf nodes for the nearest neighbours. In practice this matches about 50% of the descriptors, but only makes 1000 * 20 = 20,000 comparisons - about 50x faster and scales linearly instead of quadratically.

I am still testing and refining the code. Hopefully it helps someone.

GPU based SIFT feature extractor for iOS?

3 Answers3