How to detect the rotation angle of coordinates in an image

Question

I am trying to evaluate a simple experiment with a spinning wheel. I have some videos of a spinning wheel (about 60 - too many to process them manually). I want to detect the rotation angle of the wheel in each frame. So my idea was this: After splitting them up into single frames, I ran a feature detection on the image to get the movement of the image features:

Moving wheel

Now there are three groups of coordinates:

Those in circular motion (that means: part of the wheel)
Those in the background (barely moving - but can be used to correct the coordinates of the other ones)
And those incorrectly detected (e.g. The coordinate on top of the arrow, sliding downwards) - those need to be sorted out.

I want to separate the three groups and detect the angle of rotation of the wheel between the frames, but I don't have an idea how to do this. Are there specific algorithms for detecting rotation around a fixed point?

Do you want the rotation angle or just the speed throughout a video? — Unapiedra, Jun 06 '14 at 23:08

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Use RANSAC. Let your model be a combination of Rotation and horizontal/vertical shift.

The affine transformation would work (it has slightly more degrees of freedom but is well described). Let A_i = [x1 y1 0 0 1 0; 0 0 x1 y1 0 1] for each point i. let b_i = [x2; y2]. That gives you A and b of dimension 2n by 6, and 2n by 1, where n is the number of points.

Then solve Ax=b for x, using Least-squares. Your affine transformation T is then

[x1 x2 x5
 x3 x4 x6
 0  0  1].

x5, x6 is the shift which you don't care about. The other x1 to x4 describe the rotation matrix (caveat, if you have a bit of zoom or so, it's not purely rotation, for that normalise to make the rows orthoNormal!).

Anyway, all of this is well described. Implementation should exist in both OpenCV and Matlab.

Update

I just implemented my approach. It doesn't work. This approach sees so many Non-rotating keypoints, that the proposed affine transformation with RANSAC only finds a tiny shift. Basically, I find the Identity Matrix.

Here is an image showing the inliers identified by RANSAC. Matches (inliers only)

Conclusion 1

Method 1: Identify the arrow using matching, and determine that transformation (affine would do).
Method 2: Disregard slight camera movement (or remove first) and use a purely rotational model.

Method 1

Crop the indicated arrow from the center of rotation all the way to the outside rim. Include the center of rotation, so that the location is known later, and the rotation around that point can be calculated.

Update on Method 1

I tried this. I tried to match the crop of the arrow from the first image to the arrows in the other images. It works often but not on all frames in the sequence. A higher resolution video without that awful GIF compression would be nice. I personally think now, that this method is the one that will give better results.

Method 2

Let the purely rotational model be that x' = R x, where x and x' are 2D vectors and R a 2x2 rotation matrix.

We could solve this using LS just as mentioned above in the affine case. (Namely A = [x y 0 0; 0 0 x y]) However, if x and y are in image coordinates (pixels) then the model is plain wrong as it gives rotation around the origin. Automatically determining the origin is impossible from inside the model (could be possible by doing the affine transformation first and knowing the center for the first frame and thus extrapolating for every frame).

Assuming for simplicity that the center of rotation is the same in every image. Let it be described by the vector t. Then we just have to subtract t from x and x' before estimating the rotational model.

The implementation is actually a bit more difficult because the above wouldn't enforce I = R' R on the rotation matrix. Read https://math.stackexchange.com/a/77466 Kabsch algorithm for more.

Update on Method 2

The problem remains that virtually non-moving points have to be classified as outliers (the three groups of points you mention in your question). I tried removing the points from the translation first, and then estimating the rotation on the rest. This did not work well.

Update 2

I pushed the code to a Github repository MatlabRotationEstimation. For further progress I think higher resolution input files are needed. It would also be interesting to know, whether you need a frame by frame rotational speed or whether some agglomerative information is sufficient.

+1 for this. RANSAC to estimate the transformation matrix is the way to go. — rayryeng, Jun 07 '14 at 09:13
This is actually quite challenging. The first motion here is the translation (almost none maybe a bit of jittering), most points will belong to this transformation. The second, which we want, is the rotation! So, many more inliers will belong to the transformation "no/little movement". — Unapiedra, Jun 07 '14 at 15:04

score 1 · Answer 2 · answered Feb 11 '16 at 19:47

Cross-correlation method to find the rotation of wheel (between two different images extracted from the video)

1) Find the center of the wheel. Threshold and detect center (like pupil detection). You can add an artificial markers on the hub/axle.

2) Using the center crop ROI at the center (spoke only)

3) Use log-polar transform (from the center of the hub/axle). You can get a band of pixels (doughnut) or a line of pixels (ring).

4) Repeat (1)-(3) for the second image

5) Cross-correlate two band from two images (as described above)

6) The location of the maximum in the correlation will let you know how much rotation the wheel has experienced. The midpoint of the length of correlation array is where you have 0 rotation.

Before step 2 and 3 you can also perform FFT 2D and cross-correlate (in step 5) this FFT 2D. X axis is a scale and Y axis is an angle. On FFT 2D spectrum you can easily perform bandpass filtering by simple cropping the spectrum in X (frequency filtering) and Y (e.g. 0-90 dg only). — Slawomir Orlowski, Sep 24 '18 at 11:02

Jason · Answer 3 · 2014-06-06T16:28:22.217

I would take a slightly different approach to this problem. Instead of identifying points in the image and then trying to find a circular regression or fit to them, I would use pure math.

First, take the time derivative of the images. In the discrete case, this means you subtract neighboring frames from each other (frame at n+1 minus frame at n).

Any area of the image that has not changed will be zero because the two frames will cancel themselves out. parts that have moved slightly will have a slight difference, and parts that have moved a lot will have a large one.

We can use the gradients at those points to figure out what direction the wheel is moving, and their magnitude to figure out how fast the points are moving.

If we take the pixel as a function of time, x, and y (I(x,y,t)) then taking the time derivative is the same as:

dI(x,y,t)/dt = dI/dx * dx/dt + dI/dy * dy/dt

We can calculate dI(x,y,t)/dt as described above by taking the difference of two frames in time for every point in the image. Similarly we can calculate dI/dx and dI/dy by taking numeric differences between pixels at (x, y) and (x+1, y) for the dI/dx case and (x, y) and (x, y+1) for the dI/dy case.

Once we have dI/dx and dI/dy, the gradient is just <dI/dx, dI/dy>. You could use a least squares solver to determine the central point and then the motion direction. Remember: most of the gradients will be small or zero since most of the pixels in the image will cancel out, so you'll have to use the ones greater than a certain threshold.

Note if you want to display these dI's, you'll have to normalize them as you'll have negative pixels.

This is (dense) optical flow. There are more robust methods for that but using keypoints or optical flow, in the end you have to fit the central point and motion direction (I call it fitting the model in my answer). Fitting the right model is not that simple. — Unapiedra, Jun 07 '14 at 01:13

score 0 · Answer 4 · answered Sep 24 '18 at 11:10

Solution with FFT.

Image processing for reference frame and each consecutive (captured) frame:

1) FFT 2D.

2) log->polar transform of FFT 2D.

3) Crop resulting image from step 2 to 0:90 dg in Y axis. In X axis you can perform bandpass filtering by cropping result according to your requirements.

4) Cross-correlate reference image after image processing with each consecutive frame after image processing. The Y offset denotes rotation while X denotes scale.