I have 2 tables
USER contains a UserID and 2 zip codes (around 1 million records)
+--------+----------+----------+
| UserID | Zipcode1 | Zipcode2 |
+--------+----------+----------+
| 1 | 08003 | 10016 |
| 2 | 11780 | 48073 |
| 3 | 57106 | 33487 |
+--------+----------+----------+
LOCATION contains a LocationID and a zip code (around 1000 records)
+------------+---------+
| LocationID | Zipcode |
+------------+---------+
| 1 | 33004 |
| 2 | 96818 |
| 3 | 08816 |
+------------+---------+
I have a function that takes the zipcodes, joins to a table with latitude/longitude, calculates the distance between user zips and the location zip, and returns the shorter distance.
dbo.fnMinZipDistance(Location.Zipcode, User.Zipcode1, User.Zipcode2)
example from user 1 & location 1:
dbo.fnMinZipDistance('33004', '08003', '10016') returns 995.383
because the distance from 33004 to 10016 is 995.383
and the distance from 33004 to 08003 is 1067.802
For each UserID, I need the 3 LocationIDs with the shortest distance to either user zipcode.
My initial avenue of attack was to get the distance for each location, partition by user ordered by distance, and select where the row number is < 4
WITH UserLocations as
(
SELECT
U.UserID,
L.LocationID,
rowNum = row_number() Over(partition by U.UserID ORDER BY dbo.fnMinZipDistance(L.Zipcode, U.Zipcode1, U.Zipcode2))
FROM
USERS U
JOIN LOCATIONS L ON 1 = 1
)
SELECT * FROM UserLocations WHERE rowNum < 4
This is taking multiple days to run, since before I can get the 3 closest, I must get all distances which total around 1 billion records.
My next thought was to precalculate every possible zip code distance in a table to join to, but that would be around 1.8 billion possible combinations (roughly 43000 active US zip codes), and I'm not sure how much that will help. I'm running it for comparison now, and I am at 6 hours on the attempt.
I need to drastically reduce the run time for this query.
Any suggestions would be extremely appreciated.