I have 2D image data with respective camera location in latitude and longitude. I want to translate pixel co-ordinates to 3D world co-ordinates. I have access to intrinsic calibration parameters and Yaw, pitch and roll. Using Yaw, pitch and roll I can derive rotation matrix but I am not getting how to calculate translation matrix. As I am working on data set, I don't have access to camera physically. Please help me to derive translation matrix.
1 Answers
Cannot be done at all if you don't have the elevation of the camera with respect to the ground (AGL or ASL) or another way to resolve the scale from the image (e.g. by identifying in the image an object of known size, for example a soccer stadium in an aerial image).
Assuming you can resolve the scale, the next question is how precisely you can (or want to) model the terrain. For a first approximation you can use a standard geodetical ellipsoid (e.g. WGS-84). For higher precision - especially for images shot from lower altitudes - you will need use a DTM and register it to the images. Either way, it is a standard back-projection problem: you compute the ray from the camera centre to the pixel, transform it into world coordinates, then intersect with the ellipsoid or DTM.
There are plenty of open source libraries to help you do that in various languages (e.g GeographicLib)
Edited to add suggestions:
Express your camera location in ECEF. Transform the ray from the camera in ECEF as well taking into account the camera rotation. You can both transformations using a library, e.g. nVector.
Then proceeed to intersect the ray with the ellipsoid, as explained in this answer.

- 11,300
- 2
- 25
- 40
-
Thanks for valuable reply. My camera location is at world centre. I calculate world co-ordinates for each pixel using rotation matrix and camera intrinsic matrix. Let C is 3 by 3 camera matrix and R is 3 by 3 Rotation matrix. I calculate rotation matrix using given yaw pitch and roll parameters. Let Y, P and R are Yaw,pitch and roll respectively then R=Y*P*R. I get world co-ordinates for each pixel by multiplying image co-ordinates by inverse camera matrix and multiplying result with rotation matrix. I have GPS values of camera, using gps value I can calculate gps value for each pixel – Kamble Tanaji Apr 23 '19 at 13:29
-
The above approach I am following but still struggling to get accurate result. Please let me know where I am going wrong. – Kamble Tanaji Apr 23 '19 at 13:31
-
Edited with suggestions. An upvote/accept is appreciated if you find the answer useful – Francesco Callari Apr 24 '19 at 03:52
-
How can I forget that ? I tried it many times but unfortunately I am not eligible candidate to upvote. I have less reputation score, and may be because of that my upvote will not count. – Kamble Tanaji Apr 24 '19 at 05:43
-
I just checked my database I have access to altitude. Would you guide me to get the translation matrix using altitude ? – Kamble Tanaji Apr 24 '19 at 06:01
-
I think you are commenting about aerial images. My images are 2D images captured by normal camera – Kamble Tanaji Apr 25 '19 at 11:16