1

I have multiple images of an object taken by the same calibrated camera. Let's say calibrated means both intrinsic and extrinsic parameters (I can put a checkerboard next to the object, so all parameters can be retrieved). On these images I can find matching keypoints using SIFT or SURF, and some matching algorithm, this is basic OpenCV. But how do I do the 3D reconstruction of these points from multiple images? This is not a classic stereo arrangement, so there are more than 2 images with the same object points on them, and I want to use as many as possible for increased accuracy.

Are there any built-in OpenCV functions that do this?

(Note that this is done off-line, the solution does not need to be fast, but robust)

Cœur
  • 37,241
  • 25
  • 195
  • 267
icguy
  • 560
  • 4
  • 14

2 Answers2

1

I guess you are looking for so-called Structur from motion approaches. They are using multiple images from different viewpoints and return a 3D reconstruction (e.g. a pointcloud). It looks like OpenCV has a SfM module in the contrib package, but I have no experiences with it.

However, I used to work with bundler. It was quite uncomplicated and returns the entire information (camera calibration and point positions) as text file and you can view the point cloud with Meshlab. Please note that it uses SIFT keypoints and descriptors for correspondence establishment.

gfkri
  • 2,151
  • 21
  • 23
  • thanks, I managed to figure it out myself, but your answer was helpful! – icguy Jul 14 '16 at 15:03
  • you're welcome, but usually you can provide camera intrinsics for SfM approaches to improve the result (at least bundler is able to estimate them itself, but depending on the amount of images the result was worse)...thats why i thought SfM is easier for you than putting a checkerboard next to your object (and have it placed such that is apparent in every image)... – gfkri Jul 14 '16 at 15:28
1

I think I have found a solution for this. Structure from motion algorithms deal with the case where the cameras are not calibrated, but in this case all intrinsic and extrinsic parameters are known.

The problem degrades into a linear least squares problem:

We have to compute the coordinates for a single object point:

X = [x, y, z, 1]'
C = [x, y, z]'
X = [[C], [1]]

We are given n images, which have these transformation matrices:

Pi = Ki * [Ri|ti]

These matrices are already known. The object point is projected on the images at

U = [ui, vi] 

We can write in homogeneous coordinates (the operator * represents both matrix multiplication, dot product and scalar multiplication):

[ui * wi, vi * wi, wi]' = Pi * X

Pi = [[p11i, p12i, p13i, p14i],
      [p21i, p22i, p23i, p24i],
      [p31i, p32i, p33i, p34i]]

Let's define the following:

p1i = [p11i, p12i, p13i] (the first row of Pi missing the last element)
p2i = [p21i, p22i, p23i] (the second row of Pi missing the last element)
p3i = [p31i, p32i, p33i] (the third row of Pi missing the last element)

a1i = p14i
a2i = p24i
a3i = p34i

Then we can write:

Q = [x, y, z]
wi = p3i * Q + a3i
ui = (p1i * Q + a1i) / wi = 
   = (p1i * Q + a1i) / (p3i * Q + a3i)
ui * p3i * Q + ui * a3i - p1i * Q - a1i = 0
(ui * p3i - p1i) * Q = a1i - a3i

Similarly for vi:

(vi * p3i - p2i) * Q = a2i - a3i

And this holds for i = 1..n. We can write this in matrix form:

G * Q = b

G = [[u1 * p31 - p11],
     [v1 * p31 - p21],
     [u2 * p32 - p12],
     [v2 * p32 - p22],
     ...         
     [un * p3n - p1n],
     [vn * p3n - p2n]]

b = [[a11 - a31 * u1],
     [a21 - a31 * v1],
     [a12 - a32 * u2],
     [a22 - a32 * v2],
     ...
     [a1n - a3n * un],
     [a2n - a3n * vn]]

Since G and b are known from the Pi matrices, and the image points [ui, vi], we can compute the pseudoinverse of G (call it G_), and compute:

Q = G_ * b
icguy
  • 560
  • 4
  • 14
  • At the time of writing this, I'm sure it made sense, but looking at this years later, I have no idea what this is. – icguy Apr 08 '21 at 11:04