How to calculate a specific distance inside of a picture?

Question

sorry for my bad english. I have the following problem:

Lets say the camera of my mobile device is showing this picture.

In the picture you can see 4 different positions. Every position is known to me (longitude, latitude).

Now i want to know, where in the picture a specific position is. For example, i want to have a rectangle 20 meters in front and 5 meters to the left of me. I just know the latitude/longitude of this point, but i don't know, where i have to place it inside of the picture (x,y). For example, POS3 is at (0,400) in my view. POS4 is at (600,400) and so on.

Where do i have to put the new point, which is 20 meters in front and 5 meters to the left of me? (So my Input is: (LatXY,LonXY) and my result should be (x,y) on the screen)

I also got the height of the camera and the angles of x,y and z - axis from the camera.

Can i use simple mathematic operations to solve this problem?

Thank you very much!

Your question is not clear. If you know the position is 20 metres ahead and 5 to the left, what *exactly* are you trying to calculate? — Simon, Mar 27 '13 at 22:10
Sorry..., i would like to know, where the position inside of the picture is. I need a point (x,y) inside of my view, where i can "mark" the position. — Frame91, Mar 27 '13 at 22:19

score 5 · Answer 1 · answered Apr 02 '13 at 16:53

I see a couple of problems.

The only real mistake is you're scaling your projection up by _canvasWidth/2 etc instead of translating that far from the principal point - add those value to the projected result, multiplication is like "zooming" that far into the projection.

Second, dealing in a global cartesian coordinate space is a bad idea. With the formulae you're using, the difference between (60.1234, 20.122) and (60.1235, 20.122) (i.e. a small, latitude difference) causes changes of similar magnitude in all 3 axes which doesn't feel right.

It's more straightforward to take the same approach as computer graphics: set your camera as the origin of your "camera space", and convert between world objects and camera space by getting the haversine distance (or similar) between your camera location and the location of the object. See here: http://www.movable-type.co.uk/scripts/latlong.html

Third your perspective projection calculations are for an ideal pinhole camera, which you probably do not have. It will only be a small correction, but to be accurate you need to figure out how to additionally apply the projection that corresponds to the intrinsic camera parameters of your camera. There are two ways to accomplish this: you can do it as a post multiplication to the scheme you already have, or you can change from multiplying by a 3x3 matrix to using a full 4x4 camera matrix:http://en.wikipedia.org/wiki/Camera_matrix with the parameters in there.

Using this approach the perspective projection is symmetric about the origin - if you don't check for z depth you'll project points behind you onto you screen as if they were the same z distance in front of you.

Then lastly I'm not sure about android APIs but make sure you're getting true north bearing and not magnetic north bearing. Some platform return either depending on an argument or configuration. (And your degrees are in radians if that's what the APIs want etc - silly things, but I've lost hours debugging less :) ).

Woooah! Thanks for the really informative reply! I'll follow your ideas and test it again ;). But I have still one open question: Where in this formulas should I use the horizontal and vertical view angles of my camera? I don't think, that I can get any good result without using this values, can't I ? Thanks again! — Frame91, Apr 02 '13 at 21:22
If by view angles you mean the angles of view of the lens, those are (to an approximation) determined by the focal length and the screen resolution. Adding the focal length to the camera matrix will "fix" this. (If you had a square display and the view of your sensor was cropped accordingly, the angles would be equal)... — dabhaid, Apr 03 '13 at 10:13
...Another way is to just use basic trigonometry and relative bearings: from the center point of your view to the edge is a right triangle with angle 1/2 viewing angle (lets call it Alpha), that occupies X onscreen pixels. Your virtual point is Beta degrees relative to the center. The onscreen location u, of your virtual point is tan(beta)/tan(alpha) * X - repeat for both axes (no projection matrix required, you're doing it manually. I can't remember if relative bearing is more accurate at short distances using the camera-at-origin approach or using spherical coordinate bearing with GPS coords — dabhaid, Apr 03 '13 at 10:17
...And I say "to an approximation" because if you're using a mobile phone they mostly have wide-screen displays that may have a non-linear distortion in one or both axes - to really correct for that you'd have to do some camera calibration, but if you're using GPS for location and basic phone sensors for position there's no point - your location and position precision will be rather poor to begin with. — dabhaid, Apr 03 '13 at 10:18
Thanks again for your answeres! I've rewarded it with a bounty ;) It looks like you have been doing something like that. Do you have some open-source code in which I can take a look? Thank you very much :) — Frame91, Apr 05 '13 at 21:09
No open source code unfortunately, though I've been meaning to do that for a long time - questions about perspective projection are very frequent here. — dabhaid, Apr 07 '13 at 08:51

score 5 · Accepted Answer · edited May 23 '17 at 12:03

The answer you want will depend on the accuracy of the result you need. As danaid pointed out, nonlinearity in the image sensor and other factors, such as atmospheric distortion, may induce errors, but would be difficult problems to solve with different cameras, etc., on different devices. So let's start by getting a reasonable approximation which can be tweaked as more accuracy is needed.

First, you may be able to ignore the directional information from the device, if you choose. If you have the five locations, (POS1 - POS4 and camera, in a consistent basis set of coordinates, you have all you need. In fact, you don't even need all those points.

A note on consistent coordinates. At his scale, once you use the convert the lat and long to meters, using cos(lat) for your scaling factor, you should be able to treat everyone from a "flat earth" perspective. You then just need to remember that the camera's x-y plane is roughly the global x-z plane.

Conceptual Background The diagram below lays out the projection of the points onto the image plane. The dz used for perspective can be derived directly using the proportion of the distance in view between far points and near points, vs. their physical distance. In the simple case where the line POS1 to POS2 is parallel to the line POS3 to POS4, the perspective factor is just the ratio of the scaling of the two lines:

Scale (POS1, POS2) = pixel distance (pos1, pos2) / Physical distance (POS1, POS2)
Scale (POS3, POS4) = pixel distance (pos3, pos4) / Physical distance (POS3, POS4)
Perspective factor = Scale (POS3, POS4) / Scale (POS1, POS2)

So the perspective factor to apply to a vertex of your rect would be the proportion of the distance to the vertex between the lines. Simplifying:

Factor(rect) ~= [(Rect.z - (POS3, POS4).z / ((POS1, POS2).z - (POS3, POS4).z)] * Perspective factor.

Answer

A perspective transformation is linear with respect to the distance from the focal point in the direction of view. The diagram below is drawn with the X axis parallel to the image plane, and the Y axis pointing in the direction of view. In this coordinate system, for any point P and an image plane any distance from the origin, the projected point p has an X coordinate p.x which is proportional to P.x/P.y. These values can be linearly interpolated.

In the diagram, tp is the desired projection of the target point. to get tp.x, interpolate between, for example, pos1.x and pos3.x using adjustments for the distance, as follows:

tp.x = pos1.x + ((pos3.x-pos1.x)*((TP.x/TP.y)-(POS1.x/POS1.y))/((POS3.x/POS3.y)-(POS1.x/POS1.y))

The advantage of this approach is that it does not require any prior knowledge of the angle viewed by each pixel, and it will be relatively robust against reasonable errors in the location and orientation of the camera.

Further refinement

Using more data means being able to compensate for more errors. With multiple points in view, the camera location and orientation can be calibrated using the Tienstra method. A concise proof of this approach, (using barycentric coordinates), can be found here.

Since the transformation required are all linear based on homogeneous coordinates, you could apply barycentric coordinates to interpolate based on any three or more points, given their X,Y,Z,W coordinates in homogeneous 3-space and their (x,y) coordinates in image space. The closer the points are to the destination point, the less significant the nonlinearities are likely to be, so in your example, you would use POS 1 and POS3, since the rect is on the left, and POS2 or POS4 depending on the relative distance.

(Barycentric coordinates are likely most familiar as the method used to interpolate colors on a triangle (fragment) in 3D graphics.)

Edit: Barycentric coordinates still require the W homogeneous coordinate factor, which is another way of expressing the perspective correction for the distance from the focal point. See this article on GameDev for more details.

Two related SO questions: perspective correction of texture coordinates in 3d and Barycentric coordinates texture mapping. This diagram may help in explaining the interpolation of image coordinates based on global coordinates

Hey. If I understand your answere right, I just have to interpolate between the given points. Well... I'm not quite sure if this gives me good values. For example in my picture, the position between P1 and P3 (I mean the exact way from P1 to P3 (in meters, not on the screen) should be 1-2 centimeters under the location of P3 in the picture. If I'm using your advise, the location should be exactly in the middle of P1 and P3. Translated in meters, the "new" position would be approx. 2-3 meters in front of P1. — Frame91, Apr 06 '13 at 23:25
I think my answere is very unclear. I meant the following: Lets say P3 is 20 cm in front of the camera. P1 is approx. 120 meters in front of the camera. Now I want to find PX which should be 60 meters in front of the camera. With your solution, it would be in the middle of P1 and P3, but it should be somewhere next to P1. — Frame91, Apr 06 '13 at 23:31
I'm sorry I wasn't clearer. The interpolation needs to weighted by the the distance from the focal point. This is the purpose of the W coordinate in homogeneous coordinates. I will try to find a clearer reference and update my answer. — Steven McGrath, Apr 07 '13 at 06:23
I've updated my answer to clarify how interpolation is applied. Sorry it wasn't clear earlier! — Steven McGrath, Apr 07 '13 at 11:59
This is just amazing! I don't know how to thank you! Thanks a lot! — Frame91, Apr 07 '13 at 12:11
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/27741/discussion-between-dalanie91-and-steven-mcgrath) — Frame91, Apr 07 '13 at 22:40
I can't thank you enough! This chat was absolutely amazing. Thanks for your help! — Frame91, Apr 08 '13 at 00:17
Its a while sine you've answered this, but I'm wondering how I can easily calculate tp.y. You said "For height, it is the same, but as if you rotated the image 90 degrees to the left, assuming your screen coordinates start in the upper left." - can you describe, what exactly you meant with that? ;) — Frame91, May 22 '13 at 19:54
On many, but not all, systems, the upper left corner of your screen has the (x,y) coordinate (0,0), with positive values of y going down and positive values of x going to the right. Rotating 90 degrees puts the positive y axis of the image in the place of the the positive X axis, and the tp.x calculation will now yield vertical distance (-height, or tp.y). Try rotating your photograph, and I think you'll see what I mean. I hope this make sense! — Steven McGrath, May 23 '13 at 05:39
Hey, thanks for the fast reply! I understand, what you mean with rotating the image, but if I rotate it, I have to recalculate the distances of the points, haven't I? Thanks again ;) — Frame91, May 23 '13 at 09:52

score 3 · Answer 3 · answered Mar 27 '13 at 22:21

3

If you know the points in the camera frame and the real world coordinates, some simple linear algebra will suffice. A package like OpenCV will have this type of functionality, or alternatively you can create the projection matrices yourself:

http://en.wikipedia.org/wiki/3D_projection

Once you have a set of points it is as simple as filling in a few vectors to solve the system of equations. This will give you a projection matrix. Once you have a projection matrix, you can assume the 4 points are planar. Multiply any 3D coordinate to find the corresponding 2D image plane coordinate.

answered Mar 27 '13 at 22:21

munch1324

1,148
5
10

Thank you for the answere! Can you give me a little hint, where i have to start? I'm not very familiar with projections and I dont know what i have to do :( – Frame91 Mar 27 '13 at 22:52
here is a link to a description of the process (if done by hand) http://stackoverflow.com/questions/8925569/perspective-projection-4-points . I aplogize, but my computer graphics class was a few years ago. OpenCV is a strong computer vision library that has a calibration feature : http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html The main terms you should search for are: camera calibration, projection matrix, system of equations. – munch1324 Mar 28 '13 at 00:34
Hey, I've editet my question and know have a problem with the projection matrix. Also I've added a bounty on this question ;). Can you take a look? Thanks :) – Frame91 Mar 29 '13 at 22:06

How to calculate a specific distance inside of a picture?

3 Answers3