We crawled a set of images from the Google Street View (GSV) API. I want to estimate the 3D World coordinate from 2D Image given the following:
1. The GPS location (i.e., latitude and longitude) of the camera capturing the image
Conversion of GPS coordinates to translation matrix: Used 2 types of conversion methods to get the translation matrix -> UTM conversion and conversion to Cartesian coordinates.
- UTM conversion: Used Python's UTM library to convert GPS coordinates to UTM coordinates. Used the north and east values with a fixed height to create the translation matrix.
- Cartesian conversion - Used the following formula to generate translation matrix:
x = Radius*math.cos(latitude)*math.cos(longitude)
y = Radius*math.cos(latitude)*math.sin(longitude)
z = Radius*math.sin(latitude)
2. The rotation matrix calculated using openSFM (i.e., the SFM algorithm).
The library provides alpha, beta, gamma angles (in Radian) which map to yaw, pitch, and roll angles, respectively. The rotation matrix is constructed using the formula (http://planning.cs.uiuc.edu/node102.html)
Rotation Matrix (R): R(alpha, beta, gamma)= R_z (alpha) * R_y (beta) * R_x (gamma)
3. Based on the angle of field of view and the dimensions of the image, we estimate the calibration matrix as the following (https://codeyarns.com/2015/09/08/how-to-compute-intrinsic-camera-matrix-for-a-camera/enter link description here):
K = [[f_x s X], [0 f_y Y], [0 0 1]]
x and y are half of the image dimensions (i.e., x = width/2 and y = height/2)
The GSV API provides field of view angle θ (e.g., 45 or 80) so the focal length can be calculated as
f_x= x/tan(θ/2)
f_y= y/tan(θ/2)
Using the matrices T, R, and K, how can we estimate the 3D World coordinates of each pixel in the 2D image?