3

I have a database sitting on a SQL Server 2008 with ~12 billion rows that all contain lat, lon and a corresponding geography fields. I recently needed to add the ability to query on the geography field. I added spatial index, which took 6 days to process over 4TB of data.

CREATE SPATIAL INDEX IX_Location_Geo ON Location
(
    Geo
) USING  GEOGRAPHY_GRID 
WITH (
    GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM), 
    CELLS_PER_OBJECT = 16, PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, 
    DROP_EXISTING = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON PRIMARY
GO

Add intended using a query as such...

SELECT TOP 100 
    ci.LocationID, ci.Geo.STDistance(@g)  
FROM Location ci WITH(INDEX(IX_Location_Geo))
WHERE ci.Geo.Filter(@region) = 1 
ORDER BY ci.Geo.STDistance(@g)

Here's the estimate execution plan...

Execution Plan

I tested this query on sample set of 100 mill rows, and it worked splendidly. But on 12 bill rows the query does not respond after ~4 hours and finally fails with a disk write error, which is strange because the disk has 5TB unused.

Msg 1101, Level 17, State 10, Line 4 Could not allocate a new page 
for database 'TEMPDB' because of insufficient disk space in filegroup 
'DEFAULT'. Create the necessary space by dropping objects in the filegroup, 
adding additional files to the filegroup, or setting autogrowth on for 
existing files in the filegroup.

Hoping that there's someone that might see an obvious oversight on my part. Big Thanks!

tidwall
  • 6,881
  • 2
  • 36
  • 47
  • 1
    I'll start looking once I get my 5TB RAID-0 of SSDs :) – RichardTheKiwi Oct 07 '12 at 19:29
  • Please post the "generic error" message. Might be helpful. – usr Oct 07 '12 at 19:33
  • What does the query plan look like? Specifically, how do they differ from your small data set and your large one? – Ben Thul Oct 08 '12 at 02:25
  • @usr - error is `Msg 1101, Level 17, State 10, Line 4 Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.` – tidwall Oct 08 '12 at 04:27
  • @BenThul - seems that because of the error on the large data, the execution plan never completes. – tidwall Oct 08 '12 at 04:43

2 Answers2

1

Instead of using vertical scalability (add more memory, CPU, hard drive space - making a single beefy machine) consider using horizontal scalability (splitting load between many commodity servers). Any operation takes time and space. Big-O notation describes, that for any computation that takes longer than O(N), you are doomed to compute such volume at all. This is why, from a high level, you get errors and huge time for a query to complete.

Possible solution

Change the pattern of data access. Use sharding - split data into smaller chunks. Use WHERE clause extensively and Skip/Take pagination pattern (I am not sure about the proper syntax in T-SQL). There is also Map-Reduce pattern making a buzz. In short, stop scaling vertically at that volume.

oleksii
  • 35,458
  • 16
  • 93
  • 163
  • Good tips. I like the idea of sharding. Considering that SQL seems to handle 100 mil just fine, I may be able to break the index up into many smaller databases. I'll put together a test and see what works. – tidwall Oct 08 '12 at 04:49
0

The error message you posted says that the disk space ran out for tempdb, not for your main database. So you could make space available to it, but SQL Server should consume that much in the first place! So it is not a solution.

Please post the estimated execution plan (since you cannot obtain the actual one). I will update this answer with my thoughts on the plan.

As a general comment: Debugging a query perf problem usually starts with the plan because it tells you what SQL Server does at execution time.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Can you upload the .sqlplan XML somewhere? The problem seems to be that the intermediate result set obtained from filtering on `@region` is too big. Try this: http://msdn.microsoft.com/en-us/library/ff929109.aspx (try adding a where condition with a certain max distance). – usr Oct 08 '12 at 15:07