Method to determine polygon surface rotation from top-down camera

Question

I have a webcam looking down on a surface which rotates about a single-axis. I'd like to be able to measure the rotation angle of the surface.

The camera position and the rotation axis of the surface are both fixed. The surface is a distinct solid color right now, but I do have the option to draw features on the surface if it would help.

Here's an animation of the surface moving through its full range, showing the different apparent shapes:

My approach thus far:

Record a series of "calibration" images, where the surface is at a known angle in each image
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(). I iterate through various epsilon values until I find one that yields exactly 4 points.
Order the points consistently (top-left, top-right, bottom-right, bottom-left)
Compute the angles between each points with atan2.
Use the angles to fit a sklearn linear_model.linearRegression()

This approach is getting me predictions within about 10% of actual with only 3 training images (covering full positive, full negative, and middle position). I'm pretty new to both opencv and sklearn; is there anything I should consider doing differently to improve the accuracy of my predictions? (Probably increasing the number of training images is a big one??)

I did experiment with cv2.moments directly as my model features, and then some values derived from the moments, but these did not perform as well as the angles. I also tried using a RidgeCV model, but it seemed to perform about the same as the linear model.

Your input images seem somewhat inconsistent (rounded lines, non sharp corners). I'd try to get a better image (high contrast pattern taped on to a flat rigid block of something) while you're trying to develop your algorithm, then work on making it numerically robust once you get better results initially. also I'm not entirely sure a linear model is going to do it for you, as most cameras technically have spherical lenses (if you could move the camera very far away and zoom in however this would help approximate linear) — Aaron, Aug 02 '18 at 17:38
Have you considered solving it analytically? Either with training or analytic solutions you will have a dual-ambiguity about the sign of the angle. From point 3 in your approach you can get the length of your image in the axis perpendicular to the rotation axis (L_per). You need to know your image size in the zero degree (L_zero) though which could be achieved from your training images. The angle will be: theta = acos(L_per/L_zero) — U3.1415926, Aug 08 '18 at 09:10

score 4 · Answer 1 · answered Aug 06 '18 at 09:06

If I'm clear, you want to estimate the Rotation of the polygon with respect to the camera. If you know the length of the object in 3D, you can use solvePnP to estimate the pose of the object, from which you can get the Rotation of the object.

Steps:

Calibrate your webcam and get the intrinsic matrix and distortion matrix.
Get the 3D measurements of the object corners and find the corresponding points in 2d. Let me assume a rectangular planar object and the corners in 3d will be (0,0,0), (0, 100, 0), (100, 100, 0), (100, 0, 0).
Use solvePnP to get the rotation and translation of the object

The rotation will be the rotation of your object along the axis. Here you can find an example to estimate the pose of the head, you can modify it to suit your application

I like this method. About as close as you will get to a closed-form solution based on geometry. It handles changes to the camera or rotation axis elegantly. Machine learning or example-based models would be overkill for this task IMO. — Chungzuwalla, Sep 07 '18 at 00:36

score 1 · Answer 2 · answered Aug 02 '18 at 17:52

Your first step is good -- everything after that becomes way way way more complicated than necessary (if I understand correctly).

Don't think of it as 'learning,' just think of it as a reference. Every time you're in a particular position where you DON'T know the angle, take a picture, and find the reference picture that looks most like it. Guess it's THAT angle. You're done! (They may well be indeterminacies, maybe the relationship isn't bijective, but that's where I'd start.)

You can consider this a 'nearest-neighbor classifier,' if you want, but that's just to make it sound better. Measure a simple distance (Euclidean! Why not!) between the uncertain picture, and all the reference pictures -- meaning, between the raw image vectors, nothing fancy -- and choose the angle that corresponds to the minimum distance between observed, and known.

If this isn't working -- and maybe, do this anyway -- stop throwing away so much information! You're stripping things down, then trying to re-estimate them, propagating error all over the place for no obvious (to me) benefit. So when you do a nearest neighbor, reference pictures and all that, why not just use the full picture? (Maybe other elements will change in it? That's a more complicated question, but basically, throw away as little as possible -- it should all be useful in, later, accurately choosing your 'nearest neighbor.')

Also, for future questions of this nature, I would strongly suggest CrossValidated (stats.stackexchange); there's DataScience too, but I find it less developed and largely redundant. — one_observation, Aug 02 '18 at 18:25
If I turned this into a classification problem, then wouldn't I need a very large number of reference images to be able to predict an arbitrarily precise surface deflection? Eg. if the surface deflection is 86, but I have references images at 80 and 90, then a classified would give me (hopefully) 90. But I would really want it to give 86. Is there a way to interpolate like this with a classifier? — Steve Osborne, Aug 02 '18 at 19:50
Again, way too complicated -- just create a reference-set with one-degree increments instead. Or in general, make enough so that you don't care what the rounding error would be. If you REALLY want, just take the K nearest neighborS, and average over them (over the rotations that each one represents). Yeah you could try to weight them by distance, etc. etc ... hopefully, and I would expect, that's just not remotely necessary. — one_observation, Aug 02 '18 at 19:56

score 1 · Answer 3 · answered Aug 09 '18 at 13:20

Another option that is rather easy to implement, especially since you've done a part of the job is the following (I've used it to compute the orientation of a cylindrical part from 3 images acquired when the tube was rotating) :

Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(), alternatively you could find the four sides of your part with LineSegmentDetector (available from OpenCV 3).
Compute the angle alpha, as depicted on the image hereunder

When your part is rotating, this angle alpha will follow a sine curve. That is, you will measure alpha(theta) = A sin(theta + B) + C. Given alpha you want to know theta, but first you need to determine A, B and C.

You've acquired many "calibration" or reference images, you can use all of these to fit a sine curve and determine A, B and C.
Once this is done, you can determine theta from alpha.

Notice that you have to deal with sin(a+Pi/2) = sin(a). It is not a problem if you acquire more than one image sequentially, if you have a single static image, you have to use an extra mechanism.

Hope I'm clear enough, the implementation really shouldn't be a problem given what you have done already.

Method to determine polygon surface rotation from top-down camera

3 Answers3