Position and orientation estimation by stereo images

Question

I am trying to find out how to estimate position and orientation of an object using stereo camera images.

Assume a description of an object is given either by its edges and edge-relations, or points, or other features. The images are taken from above by a set of two cameras that have a fixed position relative to each other (like Microsoft Kinect gives you).

With Python's skimage toolkit I am able to recognise the similarites in the picture without telling if the searched object is in the images, but that is as far as I get.

What I want to do more is to segment the known object from the background and to be able to tell its position relatively to the cameras (e.g. by comparing its calculated position with the position of a known mark on the floor or something similar which I don't think will be too hard).

I know Python pretty well and have some experience in basic image processing. If this problem is already solved in OpenCV, I can work with that as well.

Thank you in advance for any help you give, either by naming keywords to improve my search, links, or by sharing your experience in this field!

To illustrate my problem: You have a bunch of same kind (shape+color) lego bricks laying in a chaotic manner, e.g. they are overlaying completely/partially or not at all and have an arbitrary orientation. The floor underneath is of the same color as the bricks. Cameras look straight down. Task is to find as many as bricks as possible and tell there location.

edit: added illustration

So are you trying to get the coordinates of the target object in respect to the cameras? Because, if the cameras are calibrated, it's a straightforward triangulation. What exactly do you mean by "pose"? — kazarey, Mar 04 '17 at 21:08
Exactly, the relative position to the camera. Is triangulation the solution, if you want x,y,z coordinate of the midpoint and orientation? To visualize it: Imagine a red plane floor and the cameras look straight from above. Then you have a bunch of red lego bricks of the same type (e.g. 2x4) lying there chaotically with an arbitrary orientation (not exactly but showing "chaos": https://s-media-cache-ak0.pinimg.com/originals/98/f0/93/98f093a777e1cb4a51e107344ebc8b29.jpg). Which one is on top and what is its position+orientation, is my problem^^ thanks — michi1510, Mar 06 '17 at 12:23
I could reply with an answer which is too big for a comment. But I cannot leave an actual answer while your question is on hold. I suggest you edit your question some so that it can be reopened. — kazarey, Mar 06 '17 at 18:55
I guess I still have to compress my reply to a comment. In nutshell, yes, triangulation is the way to get the world (or relative) coordinates of an object. This can be done by OpenCV. Although it is not as simple as calling a couple of functions, this is a well known routine. Start from camera calibration: here's [tutorial](http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html) and [Python sample](https://github.com/opencv/opencv/blob/master/samples/python/calibrate.py). — kazarey, Mar 08 '17 at 20:47
Calibration yeilds a set of matrices which describe geometrical properties of your cameras setup. Using those matrices, you can perform __rectification__. This procedure is also mentioned in the above tutorial and is easily googled. It is a mathematical aligning of the images from your camera pair. It transforms your images in such a way so that the objects of the real world presented on the images reside on the same horizontal (or vertical, whichever is desired). Rectification simplifies the search of stereo corresponding objects (just traversing the save row (column) of the second image). — kazarey, Mar 08 '17 at 20:50
Now that you have found the __disparity__ (the distance in pixels between the same real world objects on different stereo images) and coordinates of a stereopair, you can get its world coordinates using a matrix calculated during the calibration, namely, the reprojection matrix Q. And this concludes the stereoscopy steps. For greater detail, please refer to the OpenCV tutorial or to the awesome "Learining OpenCV" book by Bradski and Kaehler. — kazarey, Mar 08 '17 at 20:52
I am writing this in the assumption that the imaging target in your case is easily detected from the background. "Red Lego bricks" on a "red plane floor" from your comments may turn out to be unreasonably difficult to segment, whatever the approach you choose. On the other hand, if either of them is of a contract color, the detection can be done by simple edge (and then contour) detection. See Canny algorithm. — kazarey, Mar 08 '17 at 20:53
The orientation estimation though is not as straightforward. Even if the target objects are as simple as Lego bricks, you can lose the information about their orientation just because of losing their studs from view (e.g. due to camera postioning). Very much bluntly, you could generate a bunch or Lego bricks projected on the view plane based on the geometry information that you have. Each iteration rotate the brick with some step α, and then substitute it from the detected brick image. The one that produces the least difference would give you the orientation. — kazarey, Mar 08 '17 at 20:53
On the other hand, you could try to apply Hough transform to estimate what lines the contour of your brick is assembled from. After that, pick the lines on the long sides and consider the one that received least votes in a Hough parameter space to be the top one (in the assumption that the studs would reduce the voting). See [this answer](http://stackoverflow.com/a/24121568/2092025) for reference on using Hough transform for skew estimation. If the Lego bricks are positioned just studs up or studs down, I guess you could perform the Lego logo detection :) — kazarey, Mar 08 '17 at 20:54
Honestly, my suggestions on orientation estimation are not very robust, but that's as far as my knowledge goes. You could try to google something on the matter, because "orientation estimation" problem is quite well known in machine vision (usually in context of robotics, machinery etc). Either way, good luck, hope I was of some help. — kazarey, Mar 08 '17 at 20:58
Thank you kazarey, your answers already give me a lot of information, systemetically derived. After reading through the topics, I will come back with my understandings of how to solve the problem. One point that troubles me is: With Lego bricks, the idea of calculating the orientation of the long edges is quite easy, but doing this on objects without such an easy contour, will be troublesome. — michi1510, Mar 09 '17 at 06:54

Position and orientation estimation by stereo images

0 Answers0