0

I realize there are many cans of worms related to what I'm asking, but I have to start somewhere. Basically, what I'm asking is:

Given two photos of a scene, taken with unknown cameras, to what extent can I determine the (relative) warping between the photos?

Below are two images of the 1904 World's Fair. They were taken at different levels on the wireless telegraph tower, so the cameras are more or less vertically in line. My goal is to create a model of the area (in Blender, if it matters) from these and other photos. I'm not looking for a fully automated solution, e.g., I have no problem with manually picking points and features.

Over the past month, I've taught myself what I can about projective transformations and epipolar geometry. For some pairs of photos, I can do pretty well by finding the fundamental matrix F from point correspondences. But the two below are causing me problems. I suspect that there's some sort of warping - maybe just an aspect ratio change, maybe more than that.

My process is as follows:

  1. I find correspondences between the two photos (the red jagged lines seen below).
  2. I run the point pairs through Matlab (actually Octave) to find the epipoles. Currently, I'm using Peter Kovesi's Peter's Functions for Computer Vision.
  3. In Blender, I set up two cameras with the images overlaid. I orient the first camera based on the vanishing points. I also determine the focal lengths from the vanishing points. I orient the second camera relative to the first using the epipoles and one of the point pairs (below, the point at the top of the bandstand).
  4. For each point pair, I project a ray from each camera through its sample point, and mark the closest covergence of the pair (in light yellow below). I realize that this leaves out information from the fundamental matrix - see below.

Two views of Plaza of Orleans

As you can see, the points don't converge very well. The ones from the left spread out the further you go horizontally from the bandstand point. I'm guessing that this shows differences in the camera intrinsics. Unfortunately, I can't find a way to find the intrinsics from an F derived from point correspondences.

In the end, I don't think I care about the individual intrinsics per se. What I really need is a way to apply the intrinsics to "correct" the images so that I can use them as overlays to manually refine the model.

Is this possible? Do I need other information? Obviously, I have little hope of finding anything about the camera intrinsics. There is some obvious structural info though, such as which features are orthogonal. I saw a hint somewhere that the vanishing points can be used to further refine or upgrade the transformations, but I couldn't find anything specific.

Update 1

I may have found a solution, but I'd like someone with some knowledge of the subject to weigh in before I post it as an answer. It turns out that Peter's Functions for Computer Vision has a function for doing a RANSAC estimate of the homography from the sample points. Using m2 = H*m1, I should be able to plot the mapping of m1 -> m2 over top of the actual m2 points on the second image.

The only problem is, I'm not sure I believe what I'm seeing. Even on an image pair that lines up pretty well using the epipoles from F, the mapping from the homography looks pretty bad.

I'll try to capture an understandable image, but is there anything wrong with my reasoning?

Jack Morrison
  • 1,623
  • 1
  • 10
  • 13
Jabberwock
  • 173
  • 1
  • 9
  • What's your end goal here? As you mention, there are a lot of things that are difficult to estimate, so it'd be good to know exactly what you're looking for. – Jack Morrison Jan 01 '17 at 22:09
  • @JackMorrison - Yeah, my post was kind of long, but the goal was hidden in the third paragraph. The ultimate goal is to do a Blender model of the 1904 World's Fair (or parts of it). The immediate goal is to align the cameras so that I can use the overlaid photos as a guide to do the detailed modeling. To that end, I eventually need to adjust the photos so that they, e.g., show the objects with the correct aspect ratio & orthogonality, so I can do a metric reconstruction. I'm hoping this might also help with some problematic image pairs, like the two above. – Jabberwock Jan 02 '17 at 01:35
  • Gotcha. So you want to do the modeling by hand or by doing a 3D reconstruction with Blender? – Jack Morrison Jan 02 '17 at 05:24
  • @Jack: If I understand the terminology, I want to do a sparse 3D reconstruction (i.e., via point correspondences), and then I'll fill in the details by hand. – Jabberwock Jan 02 '17 at 05:41

1 Answers1

0

A couple answers and suggestions (in no particular order):

  1. A homography will only correctly map between point correspondences when either (a) the camera undergoes a pure rotation (no translation) or (b) the corresponding points are all co-planar.
  2. The fundamental matrix only relates uncalibrated cameras. The process of recovering a camera's calibration parameters (intrinsics) from unknown scenes, known as "auto-calibration" is a rather difficult problem. You'd need these parameters (focal length, principal point) to correctly reconstruct the scene.
  3. If you have (many) more images of this scene, you could try using a system such as Visual SFM: http://ccwu.me/vsfm/ It will attempt to automatically solve the Structure From Motion problem, including point matching, auto-calibration and sparse 3D reconstruction.
Jack Morrison
  • 1,623
  • 1
  • 10
  • 13
  • Could you give me more details (or pointers to details) on #2? I do have an estimate of the focal length and principal point from the three vanishing points. I've seen auto-calibration described in a lot of the papers I've read, but nothing that will let me get there from the fundamental matrix (or at least nothing in sufficient detail that I could see how to get there). It looks like I can get the relative projection matrix from the "infinite homography", but I can't figure out how to get there either. – Jabberwock Jan 03 '17 at 18:41
  • For #1 (I was running out of room on the previous comment), I was hoping that the homography might indicate distortion between the two. E.g., if points match well at the top but not the bottom, there might be a shear involved. Also, I was hoping the homography would be able to map correctly in the presence of aspect ratio differences. – Jabberwock Jan 03 '17 at 18:47
  • The homography won't tell you anything, unfortunately, since these are projected points without the constraints I mentioned. – Jack Morrison Jan 04 '17 at 06:22
  • You could take your estimates of the camera intrinsics and turn your fundamental matrix into an essential matrix, which relates 2 calibrated cameras. The essential matrix can then be decomposed into a rotation and (scale-less) translation. If there are more differences than that, they're probably not possible to recover from just the 2 images. – Jack Morrison Jan 04 '17 at 06:24
  • OK, thanks. There's still a lot I don't understand here, but I feel like I've beaten this to death. I guess I'll keep on reading. One last question, though - I got pitifully few views on this question (I'm sure half of them are me reloading the page). Is there a better way/place that I could have asked this? Maybe different tags? – Jabberwock Jan 05 '17 at 02:48
  • I added the tags I could think of to the questions. Not sure what else, given it's just not as popular of an SO topic. – Jack Morrison Jan 05 '17 at 05:03