1

I’m getting obscenely high 3D point values that were estimated in SFM module.

To start, I am attempting to estimate the 3D reconstruction of the 2D track points only in this tunnel video that is about 5 seconds long. The video’s resolution is 1920x1080 pixels and has a framerate of 60 fps.

GIF Conversion of video

The 2D track points in the above image are user-defined points and tracked using Lucas-Kanade method. No camera information came with this shot. I am assuming this video is taken using a monocular camera setup.

Referencing SFM module's official tutorial, I processed my tracked 2D points in the following manner:


std::vector<cv::Mat_<double>> pointMatList;
  
int numTracks = (int)track2DList.size();
int numFrames = 300;
  
for(int i = 0; i < numFrames; ++i)
{
    cv::Mat_<double> frame(2, numTracks);
    
    for(int j = 0; j < numTracks; ++j)
    {
        frame(0, j) = track2DList[j][i][0];
        frame(1, j) = track2DList[j][i][1];
    }
    
    pointMatList.push_back(frame);
}

For the next step, I created a camera matrix K with what I am assuming is to be the initial guess. Precalibration of camera should not, in theory, be needed since the SFM module will estimate/refine the camera’s intrinsics once you invoke the run() or reconstruct() routines. In addition, since I have no idea what camera was used for this shot, I had to make some assumptions on what the focal length in pixels should be.

Code for initial guess of Camera K below:


// Using a guess of 30-degrees for FOV
double initFocal = (_imageSize.width * 0.5) / tan(30 * 0.5 * (M_PI/180.0));
cv::Matx33d K = cv::Matx33d(initFocal,  0,  _imageSize.width/2.,
                            0,          initFocal,  _imageSize.height/2.,
                            0,          0,          1);

Below, I then set up the following properties for the SFM pipeline and running it:


int keyframe1 = 0;
int keyframe2 = 299;
int select_keyframes = 1;

const double k1 = 0;
const double k2 = 0;
const int verbosity = 1;
int refine_intrinsics =
cv::sfm::SFM_REFINE_FOCAL_LENGTH | cv::sfm::SFM_REFINE_PRINCIPAL_POINT |
cv::sfm::SFM_REFINE_RADIAL_DISTORTION_K1 | cv::sfm::SFM_REFINE_RADIAL_DISTORTION_K2;

// Configuring reconstruction options
cv::sfm::libmv_ReconstructionOptions options(keyframe1, keyframe2, refine_intrinsics, select_keyframes, verbosity);

// Configuring initial camera intrinsics
cv::sfm::libmv_CameraIntrinsicsOptions camOptions(cv::sfm::SFM_DISTORTION_MODEL_POLYNOMIAL,
                                                  initFocal,
                                                  initFocal,
                                                  _imageSize.width/2.,
                                                  _imageSize.height/2.,
                                                  k1,
                                                  k2);

std::vector<cv::Mat> Rs_est;
std::vector<cv::Mat> Ts_est;
std::vector<cv::Mat_<double>> points3DEstList;
            
cv::Ptr<cv::sfm::BaseSFM> sfmObj = cv::sfm::SFMLibmvEuclideanReconstruction::create(camOptions, options);

// Running the reconstruction routine finally...
sfmObj->run(pointMatList, K, Rs_est, Ts_est, points3DEstList);

Upon extracting the contents in points3DEstList, I get some obscenely large values for the 3D positions of the 2D points. See the log below:


Original intrinsics: f=3582.77 cx=960 cy=540 w=1920 h=1080
Final intrinsics: f=3582.77 cx=960 cy=540 w=1920 h=1080 k1=5.6255e-12 k2=-1.11928e-09

Skipped 0 markers.
Reprojected 7800 markers.
Total error: 2.08411e-08
Average error: 2.67193e-12 [pixels].

-- Reconstructed 3D Points:

[
[ -397065034781.862366, -603793572962.432861, 10138045231494.660156],
[ -349920118787.534668, -526880176894.646667, 10144148487534.708984],
[ -305747836170.306641, -448685804277.752991, 10280912331627.828125],
[  254597692276.514923, -421369493753.833679, 10037951348921.263672],
[  283441495883.048035, -475484759494.170776,  9871476424112.546875],
[  330541746370.687256, -582449140603.217285, 10089272406914.730469],
[ -199303184288.419708, -609771406686.366089, 10253631210867.880859],
[  218634077882.103668, -619596055520.065674,  9779117081479.052734],
[  192011968990.684753, -525239898537.542786, 10056182362791.714844],
[ -180972569526.857147, -478788694755.319580, 10155172883942.697266],
[ -111343811122.347504,  350259040943.077454, 10141705796722.130859],
[     427378239.279521,  355859768222.684448, 10134145755062.722656],
[  136181407490.486130,  349124303694.458984, 10128875211873.966797],
[  255620375947.170013,  321927183042.141846, 10101467051070.068359],
[  376595641930.277954,  359228424744.958252,  9961719359665.589844],
[  452307044682.328979,  420915947370.653870, 10092332970681.763672],
[  547344473075.463257,  487706757248.445312, 10089038909142.201172],
[ -578555531319.279785,  424974658088.160645, 10112222523170.773438],
[ -423484231093.825867,  456207438176.313171, 10144216920645.845703],
[ -401193858380.593872,  309267640100.370972, 10216747380118.269531],
[ -326494133679.072144,  342591403747.613953, 10188685478560.978516],
[ -320631261694.793701,  250194576078.970764, 10158513659603.068359],
[ -148806978507.747040,  631636630959.099365, 10178551130686.199219],
[  -39497276007.713699,  629822741799.091553, 10138166160031.388672],
[   51115921539.170876,  633243295034.075806, 10136859945807.154297],
[  135085196262.745728,  629711521038.763916, 10109250092209.322266],
]

I don’t think the reconstructed 3D track points should be this big. In addition, it seems that the pipeline made no changes in regards to my initial guess for the focal length in pixels, which makes me doubly concerned. I also posted a question in the OpenCV Forums here, and the most helpful answer I got was that there's perhaps a scale ambiguity. However, if I go by the tutorial alone, no such compensation for it was addressed. My assumption is that the scalar ambiguity is already being addressed in the pipeline's implementation. I could be wrong, given the fact I am already making lots of assumptions already, and I'd gladly be wrong if all I had to do was take the mean of the 3D points and normalize them via the standard deviation.

If someone can shed some light on this, it would be appreciated. Thank you for reading.

JDBones
  • 495
  • 1
  • 7
  • 18
  • crosspost: https://forum.opencv.org/t/sfm-estimated-3d-points-from-reconstruction-are-obscenely-high/10032 -- tldr: SFM inherently has scale ambiguity – Christoph Rackwitz Aug 31 '22 at 23:29

0 Answers0