4

I am implementing Haversine formula in PHP as the follows

$result=mysqli_query($mysqli,"SELECT *,( 6371 * acos( cos( radians({$lat}) ) * cos( radians( `latitude` ) ) * cos( radians( `longitude` ) -radians({$lon}) ) +sin( radians({$lat}) ) * sin( radians( `latitude` ) ) ) ) AS distance FROM `places` HAVING distance <= {$radius} ORDER BY distance ASC") or die(mysqli_error($mysqli));

And Inside the Haversine fetch loop, I have the query that iterates through the results of haversine to select records that matches the IDs returned by the haversine formula. The Query is as follows.

 while($row = mysqli_fetch_assoc($result)) 

    {
   $rest_time=$row['id'];

$result1=mysqli_query($mysqli,"SELECT * FROM my_friends  WHERE personal_id='".$personal_id."' AND id='".$rest_time."'") or die(mysqli_error($mysqli)); 

//Some operations here
    }

HOw can I perform Join operation to mingle these queries into a single one? WOuld it be wise to do so, from the optimisation point of view if the second table has, like 50k users and the first table has almost 1000 records?

Ashif Shereef
  • 454
  • 1
  • 8
  • 24
  • 4
    **WARNING**: When using `mysqli` you should be using [parameterized queries](http://php.net/manual/en/mysqli.quickstart.prepared-statements.php) and [`bind_param`](http://php.net/manual/en/mysqli-stmt.bind-param.php) to add user data to your query. **DO NOT** use string interpolation or concatenation to accomplish this because you have created a severe [SQL injection bug](http://bobby-tables.com/). **NEVER** put `$_POST` or `$_GET` data directly into a query, it can be very harmful if someone seeks to exploit your mistake. – tadman Oct 23 '16 at 04:10
  • @tadman you presume that the variables that appear in the query have not been sanitized, but this is not necessarily the case. – Walter Tross Oct 27 '16 at 22:10
  • @WalterTross When you *assume* anything that's when you have problems. This is too dangerous to treat lightly. Things must be *obviously* escaped, or there's no proof that they are. – tadman Oct 28 '16 at 01:19
  • 1
    @tadman you wrote "...you have created a severe SQL injection bug". Apart from the fact that I would call it a vulnerability, not a bug, you should also write "unless you can guarantee that the variables you are using are numbers or escaped strings". Having been for years in a company where we DID guarantee that (and were not using `mysqli`), I consider it an overshoot to directly point out a bug in this case. But of course you are right regarding the fact that string concatenation and interpolation should be discouraged now that `mysqli` is the standard. – Walter Tross Oct 28 '16 at 08:34
  • 1
    I have used paramterized queries and bind_param. I was just showoing you the formula. This discussion is going in a totally different way. – Ashif Shereef Oct 28 '16 at 10:13
  • @AshifShereef Try to include that code in your example, it avoids distractions and helps to promote best practices. – tadman Oct 28 '16 at 19:24

2 Answers2

5

Any operation you do here which operates on all rows will be slow with that many records.

What you need to do is take advantage of index's. To use an index it must be a simple query and NOT the result of a function (as it is currently).

What you are doing by doing a radius search is making a circle around a point, By using some trig before the circle is made we can come up with the following

Circle two squares

where S1 is the largest square inside, and S2 is the smallest square outside.

Now we can work out the dimensions of these two squares and anything OUTSIDE of S2 is hit by and index, and anything INSIDE of S1 is hit by an index, leaving only the small area inside which now needs to be looked up using the slow method.

If you need the distance from the point ignore the S1 sections (as everything inside of the circle needs the haversine function) as a note here, while everything inside of the circle needs it, not every point is within the distance, so both WHERE clauses are still needed

So lets calculate these points using the unit circle Unit Circle

function getS1S2($latitude, $longitude, $kilometer)
{
    $radiusOfEarthKM  = 6371;
    $latitudeRadians  = deg2rad($latitude);
    $longitudeRadians = deg2rad($longitude);
    $distance         = $kilometer / $radiusOfEarthKM;

    $deltaLongitude = asin(sin($distance) / cos($latitudeRadians));

    $bounds = new \stdClass();

    // these are the outer bounds of the circle (S2)
    $bounds->minLat  = rad2deg($latitudeRadians  - $distance);
    $bounds->maxLat  = rad2deg($latitudeRadians  + $distance);
    $bounds->minLong = rad2deg($longitudeRadians - $deltaLongitude);
    $bounds->maxLong = rad2deg($longitudeRadians + $deltaLongitude);

    // and these are the inner bounds (S1)
    $bounds->innerMinLat  = rad2deg($latitudeRadians  + $distance       * cos(5 * M_PI_4));
    $bounds->innerMaxLat  = rad2deg($latitudeRadians  + $distance       * sin(M_PI_4));
    $bounds->innerMinLong = rad2deg($longitudeRadians + $deltaLongitude * sin(5 * M_PI_4));
    $bounds->innerMaxLong = rad2deg($longitudeRadians + $deltaLongitude * cos(M_PI_4));

    return $bounds;
}

Now your query becomes

SELECT 
  *
FROM
  `places` 
HAVING p.nlatitude BETWEEN {$bounds->minLat} 
  AND {$bounds->maxLat} 
  AND p.nlongitude BETWEEN {$bounds->minLong} 
  AND {$bounds->maxLong} 
  AND (
    (
      p.nlatitude BETWEEN {$bounds->innerMinLat} 
      AND {$bounds->innerMaxLat} 
      AND p.nlongitude BETWEEN {$bounds->innerMinLong} 
      AND {$bounds->innerMaxLong}
    ) 
    OR (
      6371 * ACOS(
        COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
          RADIANS(`longitude`) - RADIANS({ $lon })
        ) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
      )
    )
  )) <= {$radius} 
ORDER BY distance ASC 

IMPORTANT

The above has text for readability, Please ensure these values are escaped correctly / preferably parameterized

This then can take advantage of the index, and allow the join to happen in a faster time

Adding the join this becomes

SELECT 
  *
FROM
  `places` p
  INNER JOIN my_friends f ON f.id = p.id
WHERE   p.latitude BETWEEN {$bounds->minLat} 
  AND {$bounds->maxLat} 
  AND p.longitude BETWEEN {$bounds->minLong} 
  AND {$bounds->maxLong} 
  AND (
    (
      p.latitude BETWEEN {$bounds->innerMinLat} 
      AND {$bounds->innerMaxLat} 
      AND p.longitude BETWEEN {$bounds->innerMinLong} 
      AND {$bounds->innerMaxLong}
    ) 
    OR (
      6371 * ACOS(
        COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
          RADIANS(`longitude`) - RADIANS({ $lon })
        ) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
      )
    )
  )  <= {$radius} 
  AND f.personal_id = {$personal_id}
ORDER BY distance ASC 

IMPORTANT

The above has text for readability, Please ensure these values are escaped correctly / preferably parameterized

Assuming you have the correct indexes this query should remain fast and allow you to do the join.

Looking at the code above im not sure where personal_id comes from so have left as it is

if you need the distance from the query, you can remove the S1 square

    (
      p.latitude BETWEEN {$bounds->innerMinLat} 
      AND {$bounds->innerMaxLat} 
      AND p.longitude BETWEEN {$bounds->innerMinLong} 
      AND {$bounds->innerMaxLong}
    ) 

and move the second part of that OR

  6371 * ACOS(
    COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
      RADIANS(`longitude`) - RADIANS({ $lon })
    ) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
  )

back to the select, which still makes use of S2.

I would also make sure to remove the "magic number" in the query 6371 is the radius of the earth in Kilometer

Community
  • 1
  • 1
exussum
  • 18,275
  • 8
  • 32
  • 65
  • Consider that I don't want to join the tables. I am looping through the result of the query as in your answer. However, I need to have the "distance" property too like I can fetch with row['distance']. How can I get that from your query? – Ashif Shereef Oct 31 '16 at 08:57
  • Simply by adding the same calculation from your original query to the select. The question was about having the join though ? – exussum Oct 31 '16 at 21:06
  • So If I remove the s1 Square and then move the calculation part to the initial select *, then would it still make use of the bounds? – Ashif Shereef Nov 01 '16 at 03:14
  • @exussem In order to use the distance, wouldn't it be better to 'move' the haversine to the select rather than add? – digout Jun 13 '18 at 11:56
  • The haversine will run for all of the rows then, removing the smaller square optimisation. What does work fairly well and is quick is using Pythagoras. – exussum Jun 13 '18 at 14:01
1

In this case, put the first query as a derived subquery in the second:

SELECT  p.*, f.*    -- Select only the columns you need, not all
    FROM  
    (
        SELECT  *,
                ( 6371 * acos( cos( radians({$lat}) ) * cos( radians( `latitude` ) )
                  * cos( radians( `longitude` ) -radians({$lon}) )
                  +sin( radians({$lat}) ) * sin( radians( `latitude` ) ) )
                ) AS distance
            FROM  `places`
            HAVING  distance <= {$radius}
            ORDER BY  distance ASC"
            LIMIT 10               -- Didn't you forget this??
    ) AS p
    JOIN  my_friends AS f  ON f.personal_id p.personal_id
      AND  id='".$rest_time."'"     -- Huh??
Rick James
  • 135,179
  • 13
  • 127
  • 222