0

I have a table with several million locations and a table with about 100k points of interest on a SQL Server 2019 Standard. Now I'm trying to project the following logic to a query: "Show me all locations that have bus stations within a radius of 500m to 2000m" (In the future I'd like to specify an exact number of bus stations within a given radius)

My initial approach was this:

SELECT
    L.ID,
    COUNT(*)
FROM
    dbo.Locations AS L
    INNER JOIN POI AS P ON P.GeographyPoint.STDistance(L.GeographyPoint) BETWEEN 500 AND 2000

This basically works, but execution time is unacceptable for a web application and increases with distance and number of POIs. I also tried JOINing by

P.GeographyPoint.STBuffer(500).STContains(L.GeographyPoint) = 0 AND
P.GeographyPoint.STBuffer(2000).STContains(L.GeographyPoint) = 1

or Filter(L.GeographyPoint) = 1 or STIntersects(L.GeographyPoint) = 1 with slightly different but still unsatisfying results. I also tried creating a spatial index on a precalculated buffered Geography Column for 1000m radius which took an hour and created a huge index space with still poor performance. Caching all distances of all POIs to all locations would result in about 400 billion rows, so that's no option either.

Any help is appreciated!

wuha
  • 15
  • 5
  • 1
    Keep in mind that each degree of lat or lng is approx 69 miles. Rather than testing every combination, you can dramatically reduce number of records to test with a simple WHERE. Take a peek at "Item 2" https://stackoverflow.com/questions/45869613/handling-very-big-table-in-sql-server-performance/45870223#45870223 – John Cappelletti Mar 02 '21 at 18:15
  • The nearest neighbor performance enhancements in SQL Server have a very specific shape. Last time I looked, BETWEEN wasn't part of that. The doco shows the supported predicate shapes. You have to keep your predicates in one of those formats. – Greg Low Mar 02 '21 at 20:30
  • @JohnCappelletti Good input, first runs where not that promising, I need to figure out a way to make it work with all the indexes being used since I need the index for lat/lng and the spatial one for the geographypoint comparison – wuha Mar 04 '21 at 10:33

1 Answers1

0

I think your instinct to pre-calculate areas around your locations is a good one. But as another commenter said, the spatial query optimizations assume some things and your "distance between this and that" doesn't fit those assumptions.

However! I think I've come up with "one weird trick" to get you what you're looking for. Observe the following:

declare @p geography = geography::Point(37.540830, -122.299880, 4236);

select @p.STBuffer(2000).STIntersection(@p.STBuffer(500).ReorientObject());

This creates a donut with outer radius 2000m and inner radius 500m centered at the specified point. If you store that in your pre-calculated column, you should be able to do something like:

SELECT
    L.ID,
    COUNT(*)
FROM
    dbo.Locations AS L
    INNER JOIN POI AS P ON P.GeographyPoint.STIntersects(L.Donut) = 1;

In the interest of making this inexpensive to try, you could put these donuts into a separate table that is just the (LocationID, Donut). That way, you could calculate these for, say, 100 of your points and try out the spatial query w/o needing to make changes to the actual Locations table. If you go this route, you can change the query to go against this new table rather than against Locations.

Ben Thul
  • 31,080
  • 4
  • 45
  • 68
  • Thanks, I just created a test-location-table with about 12k rows and a spatial index on a column with 5000m buffered geographypoint. Joining this against my POI table with 100k rows using `Filter()` (which is faster than `STContains()`) already takes 9 seconds... I think it might have to do with joining many-to-many because searching POIs around one specific location is quite fast. I tried another interesting thing: selecting and buffering all POI geographypoints an joining them to one multi polygon using `STUnion()`, still not that great. – wuha Mar 03 '21 at 09:53