0

I have a table for GeocodedPoints. This query that I am optimizing tries to pull out matching points for a lat-long location. Unfortunately it's too slow!

The table is essentially a list of bounding boxes and a corresponding address. It also contains a DBGeography of the exact bounding box but, given how slow this is in SQL, I materialize that into .NET land and query the DBGeography there.

My query then basically looks to see if a point is within the bounding box [specified by NESW] and returns the results.

In my mind, this should be very quick, but alas, it's not as fast as I thought it should be.

I have a none-unique, non-clustered index on the bounds and the UTC like so

enter image description here

Note the UTC is required as we only return results that have been modified within the last 2 weeks.

I have run this through SQL Profiler tools and here's some information:

  • 500k rows in that table
  • Duration ranges from 250-350ms per call
  • Reads ranges from 5-20k

And finally here is the query I use

exec sp_executesql N'SELECT 
    [Project1].[ID] AS [ID], 
    [Project1].[CENTER] AS [CENTER], 
    [Project1].[BOUNDS] AS [BOUNDS], 
    [Project1].[UTC_UPDATED] AS [UTC_UPDATED], 
    [Project1].[PLACE_ID] AS [PLACE_ID], 
    [Project1].[FORMATTED_ADDRESS] AS [FORMATTED_ADDRESS], 
    [Project1].[POST_CODE] AS [POST_CODE], 
    [Project1].[SOURCE] AS [SOURCE], 
    [Project1].[North] AS [North], 
    [Project1].[East] AS [East], 
    [Project1].[South] AS [South], 
    [Project1].[West] AS [West]
    FROM ( SELECT 
        [Extent1].[ID] AS [ID], 
        [Extent1].[CENTER] AS [CENTER], 
        [Extent1].[BOUNDS] AS [BOUNDS], 
        [Extent1].[UTC_UPDATED] AS [UTC_UPDATED], 
        [Extent1].[PLACE_ID] AS [PLACE_ID], 
        [Extent1].[FORMATTED_ADDRESS] AS [FORMATTED_ADDRESS], 
        [Extent1].[POST_CODE] AS [POST_CODE], 
        [Extent1].[SOURCE] AS [SOURCE], 
        [Extent1].[North] AS [North], 
        [Extent1].[East] AS [East], 
        [Extent1].[South] AS [South], 
        [Extent1].[West] AS [West]
        FROM [dbo].[HST_GEOCODE_POINTS] AS [Extent1]
        WHERE ([Extent1].[UTC_UPDATED] > @p__linq__0) AND ([Extent1].[North] >= @p__linq__1) AND ([Extent1].[East] >= @p__linq__2) AND ([Extent1].[South] <= @p__linq__3) AND ([Extent1].[West] <= @p__linq__4)
    )  AS [Project1]
    ORDER BY [Project1].[UTC_UPDATED] DESC, [Project1].[SOURCE] DESC',N'@p__linq__0 datetime2(7),@p__linq__1 float,@p__linq__2 float,@p__linq__3 float,@p__linq__4 float',@p__linq__0='2017-05-16 11:12:12.4425257',@p__linq__1=53.016466402998645,@p__linq__2=-1.715320912729779,@p__linq__3=53.016466402998645,@p__linq__4=-1.715320912729779

Note My UTC is currently first in this query but last in the index. Ironically this seems to make my queries faster although makes every call hit 20k reads.

Chris
  • 26,744
  • 48
  • 193
  • 345
  • @GordonLinoff I'm not sure what you mean? I already have a DBGeography on the table but cannot use it for querying as its too slow. – Chris May 30 '17 at 11:29
  • Mind adding execution plan from SQL Sentry Plan Explorer (a free tool)? That would help a lot. – Evaldas Buinauskas May 30 '17 at 11:41
  • How many rows are typically updated in two weeks. If not that many, it *may* be more efficient to have the date column first in the index... – user1429080 May 31 '17 at 07:14

1 Answers1

0

1) Use geography instead of DBGeography to store data: https://learn.microsoft.com/en-us/sql/t-sql/spatial-geography/spatial-types-geography

2) Create a spatial index on geography column https://learn.microsoft.com/en-us/sql/relational-databases/spatial/spatial-indexes-overview

3) Seek rows rows you need using your_geography.STIntersects(other_geography) function or similar

Unfortunatly you cannot another columns in your spacial index (UTC_UPDATED for example).

p.s. you can also try geometry type

Mikhail Lobanov
  • 2,976
  • 9
  • 24
  • Hi, thanks for your answer. I already have a column of type geography and had an index on it but it was far slower than my current method. My current method is actually 50x faster in fact on execution time and reads about 15x less rows – Chris May 31 '17 at 07:37
  • 50x faster means that with `geography` column query executes 300ms*50 = ~15 seconds? Are your sure that spatial index has been used? How many rows is filtered by coordinates and how many rows is filtered by date? – Mikhail Lobanov May 31 '17 at 07:44
  • At the time I wrote down my stats the query using geography was taking 1650 ms and hitting 319,450 reads. With the query above it was hitting 21,195 reads and taking 31ms. After having ran the system for a few more days I'm seeing it slow down a touch hence wondering what I could do. Every resource I see online states that geography is inherently slow though? – Chris May 31 '17 at 07:49
  • For test purpose: how many rows would be returned if you query without `[Extent1].[UTC_UPDATED] > @p__linq__0` condition? Or if you leave only that condition? 1650 ms is too small for 500k rows with spatial index. I think somethink goes wrong... – Mikhail Lobanov May 31 '17 at 07:58