For this example, I have one table which includes a list of people, a group category, and the location for each individual person (Long/Lat coordinates). A single individual can be in multiple groups. Here is an example table:
Person Group Long Lat
1 1 11 23
2 1 12 24
. . . .
. . . .
. . . .
2 2 12 24
I have another table which lists Businesses, their locations, and a shared group that matches the groupings in the first table. Again, a Business can be in multiple groups. Example table:
Busns Group Long Lat
5 1 5 6
6 1 6 7
. . . .
. . . .
. . . .
5 2 5 6
I want to, by Person and by Group, match the Business with minimum distance between the two. This is proving to be a very memory-intensive task as I have it. Currently I create an enormous table through a RIGHT JOIN
, which then measures the distance between a person and a business for every group. then I create another that finds the minimum distance for each person in a group, and then I do an INNER JOIN
in order to pair the original table down. Example code:
DROP TABLE IF EXISTS DistancePairs;
CREATE LOCAL TEMPORARY TABLE DistancePairs ON COMMIT PRESERVE ROWS AS (
SELECT a.Person
,a.Group
,b.Business
,a.Latitude AS PersonLat
,a.Longitude AS PersonLong
,b.Latitude AS BusinessLat
,b.Longitude AS BusinessLong
,0.621371*DISTANCEV(a.Latitude,a.Longitude,b.Latitude,b.Longitude) AS AproxDistance
FROM people a
RIGHT JOIN business b
ON a.Group = b.Group
);
DROP TABLE IF EXISTS MinDist;
CREATE LOCAL TEMPORARY TABLE MinDist ON COMMIT PRESERVE ROWS AS (
SELECT DISTINCT
Person
,Group
,MIN(AproxDistance) AS AproxDistance
FROM Distance Pairs
);
SELECT a.Person
,a.Group
,a.Business
,a.AproxDistance
FROM DistancePairs a
JOIN MindDist b
ON a.Person = b.Person
AND a.Group = b.Group
AND a.AproxDistance = b.AproxDistance
;
Is there a better way to do this? This performs terribly and runs for hours given the size of the data set I'm using. The original Person and Business tables have already been created using WHERE statements to limit their size.