I am trying to find out how to estimate position and orientation of an object using stereo camera images.
Assume a description of an object is given either by its edges and edge-relations, or points, or other features. The images are taken from above by a set of two cameras that have a fixed position relative to each other (like Microsoft Kinect gives you).
With Python's skimage toolkit I am able to recognise the similarites in the picture without telling if the searched object is in the images, but that is as far as I get.
What I want to do more is to segment the known object from the background and to be able to tell its position relatively to the cameras (e.g. by comparing its calculated position with the position of a known mark on the floor or something similar which I don't think will be too hard).
I know Python pretty well and have some experience in basic image processing. If this problem is already solved in OpenCV, I can work with that as well.
Thank you in advance for any help you give, either by naming keywords to improve my search, links, or by sharing your experience in this field!
To illustrate my problem: You have a bunch of same kind (shape+color) lego bricks laying in a chaotic manner, e.g. they are overlaying completely/partially or not at all and have an arbitrary orientation. The floor underneath is of the same color as the bricks. Cameras look straight down. Task is to find as many as bricks as possible and tell there location.
edit: added illustration