0

I want to transform many points (whole 720p image would be best) with ~30fps. Right now i just loop through a mask and look for marked pixels. Then i transform every marked pixel to a new frame. Is there any way to speed it up? The code runs on a windows tablet, so i don't know if CUDA could help.

//Look for white pixels in mask image and transform them to new frame orientation
for (int row = 0; row < mask.rows; row++){
    for (int col = 0; col < mask.cols; col++){

        if (mask.at<uchar>(row, col) == 255){

            //Point in 2D hom
            p = (Mat_<double>(3, 1) << col, row, 1);
            p11 = CameraMatrix480.inv()*p;  //Pixel-->Camera


            //Project 2D Points to table
            double d = abs((p11 - midCam).dot(table_normal_cam)); //intersection of point with table surface is z value
            ps = p11 - d*table_normal_cam;
            p11 *= -Mat(p11 - ps).at<double>(2);

            //Get point in new frame in hom camera coordinates
            p11.copyTo(p_hom1(Range(0, 3), Range(0, 1)));
            p_hom2 = M * p_hom1; //p_hom in frame2

            //Point in frame2 in pixel coordinates
            p12 = (1 / p_hom2.at<double>(2))*(CameraMatrix480*p_hom2(Range(0, 3), Range(0, 1))); //Camera-->Pixel
            pixel = Point(p12.at<double>(0), p12.at<double>(1));

            //Check if new location is in the frame
            if (rect.contains(pixel)){
                RGB& rgb = output.ptr<RGB>(pixel.y)[pixel.x];
                rgb = white;
            }

        }
    }
  • 1
    can you write what you're doing with mathematical formulas? In linear algebra there might be steps you can reorder to precompute stuff. – Micka Sep 08 '15 at 13:10
  • Basically i have a test setup with a table in a room. Now i grab a video frame of the table and draw in this picture. Then i map the drawing on the table surface(so i get 3D camera coordinates with depth) and transform the drawing to the orientation of the next frame. This allows me to draw something in one grabbed frame, which stays at the same position for every other frame. – Wurzelsepp Sep 09 '15 at 12:14
  • That doesnt help for optimization... I meant mathematical formulas... but just for being curious: do you compile in release mode with release libraries? – Micka Sep 09 '15 at 12:21

4 Answers4

4

Without testing I think the calculation of the inverse camera matrix is the most expensive operation in your code. Assuming that the camera matrix is constant, you could pre-compute the inverse.

Mat invCameraMatrix(CameraMatrix480.inv());
...
p11 = invCameraMatrix*p;  //Pixel-->Camera
...

In addition you could easily parallelize the for loop with OpenMP and check if that gains any performance. For using CUDA you will need a Nvidia graphics card, which is probably not available in your windows tablet device.

max0r
  • 341
  • 2
  • 6
  • The inv() only needs 3% of the total computation, but i will still precompute it. The most expensive part is the dot product with 16%. I dont think the functions are so slow, I simply call them very often(up to 100.000 iterations for each frame) – Wurzelsepp Sep 09 '15 at 12:03
  • Are you sure? I just tested it on my machine. Inverting a `3x4` matrix is about 3 times slower than a dot product of two `3x1` vectors (inverting took about 0.3 ms and the dot product took about 9.9e-5 ms). – max0r Sep 09 '15 at 13:28
0

Can you try use cv::UMat for performance testing?

I use OpenMP for quick image operations by pixels.

lexxai
  • 75
  • 9
  • I simply tried changing Mat to UMat, but it didn't improve at all. I will try OpenMP as soon as i find the time. – Wurzelsepp Sep 10 '15 at 11:33
0

Have you considered changing your operations to float instead of double? Since you are on a mobile device it might help speed up operations.

Also, what is rect in the last if condition?

Utkarsh Sinha
  • 3,295
  • 4
  • 30
  • 46
  • Inititally i used float, but some of my functions only accepted double. rect is just a rectangle with the size of my output image. This way i check if my projected points are in within the image. – Wurzelsepp Sep 10 '15 at 11:13
  • Try converting things back to float. Only convert to double for that specific function call. If possible, try and figure out the hardware optimizations for floating point numbers on your platform. Also, would breaking each line into a single subtask help with profiling? I can see some lines doing matrix subtraction, cross product and scalar mult in the same line. – Utkarsh Sinha Sep 10 '15 at 12:16
0

I managed to get the transform running for a 720p image in ~40ms, simply by using Matx instead of Mat. The images are stored in UMat, maybe this helped too.