I'm looking for a way to get HAVERSINE() in BigQuery. For example, how to get the closest weather stations to an arbitrary point?
Asked
Active
Viewed 7,854 times
11
-
ref: https://twitter.com/joaocorreia/status/827638555035840512 – Felipe Hoffa Feb 06 '17 at 16:57
2 Answers
17
2019 update: BigQuery now has a native ST_DISTANCE()
function, which is more accurate than Haversine.
For example:
#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
lat2 FLOAT64, long2 FLOAT64) AS (
RADIANS_TO_KM(
ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
COS(RADIANS(long1) - RADIANS(long2)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);
SELECT
lat,
lon,
name,
HAVERSINE(40.73943, -73.99585, lat, lon) *1000 AS haversine_distance
, ST_DISTANCE(
ST_GEOGPOINT(-73.99585, 40.73943)
, ST_GEOGPOINT(lon,lat)) bqgis_distance
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY 1 DESC
LIMIT 4;
Using standard SQL you can define a SQL function to encapsulate the logic. For example,
#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
lat2 FLOAT64, long2 FLOAT64) AS (
RADIANS_TO_KM(
ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
COS(RADIANS(long1) - RADIANS(long2)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);
SELECT
lat,
lon,
name,
HAVERSINE(40.73943, -73.99585, lat, lon) AS distance_in_km
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY distance_in_km
LIMIT 4;

Felipe Hoffa
- 54,922
- 16
- 151
- 325

Elliott Brossard
- 32,095
- 2
- 67
- 99
-
1I had to add a `CASE WHEN lat1 = lat2 AND long1 = long2 THEN 0` in HAVERSINE to avoid errors when computing locations that are exactly the same. – Tim Swast Jun 19 '18 at 01:26
7
2018 update: BigQuery now supports native geo functions.
ST_DISTANCE: Returns the shortest distance in meters between two non-empty GEOGRAPHYs.
Distance between NY and Seattle:
#standardSQL
WITH geopoints AS (
SELECT ST_GEOGPOINT(lon,lat) p, name, state
FROM `bigquery-public-data.noaa_gsod.stations`
)
SELECT ST_DISTANCE(
(SELECT p FROM geopoints WHERE name='PORT AUTH DOWNTN MANHATTAN WA'),
(SELECT p FROM geopoints WHERE name='SEATTLE')
)
3866381.55
Legacy SQL solution (standard pending):
SELECT lat, lon, name,
(111.045 * DEGREES(ACOS(COS(RADIANS(40.73943)) * COS(RADIANS(lat)) * COS(RADIANS(-73.99585) - RADIANS(lon)) + SIN(RADIANS(40.73943)) * SIN(RADIANS(lat))))) AS distance
FROM [bigquery-public-data:noaa_gsod.stations]
HAVING distance>0
ORDER BY distance
LIMIT 4
(based on http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/)

Felipe Hoffa
- 54,922
- 16
- 151
- 325
-
With standard SQL you could put the logic in a SQL UDF rather than having to put it directly into the query's body. – Elliott Brossard Feb 06 '17 at 14:39
-
I know! I did a quick try, but then I was missing DEGREES() and RADIANS(). Left the query pending until I figure out the equivalent transformations, including the lack of PI(). But I'll be back :) – Felipe Hoffa Feb 06 '17 at 16:43
-