How can 3D structure of scene be determined from video frames?

Question

This is an exam question, related to 3D reconstruction from images.

The figure below shows a frame from a video sequence obtained by a single video camera moving through a static indoor scene. The sequence of images is to be used to generate 3D representation of the scene. The small white squares superimposed on the image are regions that have been automatically selected as suitable "interesting" locations to be matched between frames.

Considering we have already obtained matching positions in different frames,how can 3D structure of the scene be determined? Please note that the images are obtained from the same camera and, in general, change in position of the camera between frames in not accurately known.

My Approach:

The problem is that we are obtaining the images from the same camera and there is no info about the position changes of the camera. If this was not the case I would describe the process of the triangularity, where we can find Z from disparity and baseline. However, I suppose I cannot do this in our case. I thought of explaining the calibration methods, but I guess it is not what the question is asking for as the intrinsic parameters of the camera are not known. There are some methods for obtaining the 3D structure from one camera described in the book of Forsyth such as using the Longuet-Higgins relation and using calibration matrices. However, all that was not covered at the university and I actually struggle to understand all that.

How can 3D structure of scene be determined from video frames?

0 Answers0