0

I have a table with 2.5 million rows (I know, development nearly finished to warehouse this).

The system querying has been suffering some SQL timeouts understandably. I profiled a few of my queries and noticed 97% of the work was scanning an index.

There was no other index on the table other than the clustered index on the ID column. Now, the queries timing out appeared to be problems with the one datetime column on the table being used in the where, so I added an index to that datetime column.

I used the index wizard, which spun loading for a few mins and then instantly CPU usage dropped from a steady ~95% to around 9%. I thought I'd killed the service, but it's just performing with a lot less CPU now.

Couple Questions:

  • Is there anything I need to be aware of given this huge performance gain? Have I offloaded the problem elsewhere (memory rather than CPU, for example)? Or is it that it was simply just crying out for an index?
  • I was under the impression an index needed to build. While the wizard was loading (for a min or so), was this the building of the index completing? I expected to have to wait for an hour or so given the 2.5 million records?
  • 7
    I think we would need to see a) the query, and b) the execution plan, in order to give you an exact answer. Just because SQL Server spends all its time scanning an index does not mean that there would be a better alternative. Doing a sequential scan might have been an even worse option. – Tim Biegeleisen Sep 13 '18 at 13:58
  • 5
    2.M rows isn't a lot btw. – Panagiotis Kanavos Sep 13 '18 at 14:04
  • What they said ^. Plus, there isn't enough information regarding the composition of the data rows, or the actual index you created. If it's a single column index on the single datetime column, then all SQL Server has to do is grab, re-arrange and dump that data into an index in some data pages. If it's a single field + index key I'd be sad if it wasn't almost instant, even on 2.5m rows. Index columns + includes will determine how many data pages are needed to build the index and, therefore, the more actual data in a single index row, the longer it will take (no more comment space!) – Dave Brown Sep 13 '18 at 14:09
  • If a query returns many rows the query engine may decide that its more efficient to do a scan. It depends a little on the locking schema in effect and a lot on the number of rows being returned and a lot as to how the data being searched for matches the index. Without the query plan everyone here (including me) are just blowing smoke. – benjamin moskovits Sep 13 '18 at 14:21
  • Thanks gents, all useful feedback. Seems there's a lot to learn here on my part. I'm reassured to hear 2.5m rows isn't a lot, but I think aggregating the data up into time periods for the purpose of reporting would still be sound practice here so I see this index as a fix to buy me more time. I will share the query and previous plan to improve the quality of my question. Thanks Dave, I suspected that was the case. This has no includes. A few queries filter on this datetime field but reference different columns in the select and I didn't want to include all. Not sure if that was right or not. –  Sep 13 '18 at 14:22
  • @benjaminmoskovits - understood. Question's too vague. I will share the table, query, and plan shortly when I jump back onto the server. Thanks for the comment. –  Sep 13 '18 at 14:22
  • If different versions of this query are being used you may want to explore using profiler to capture a slice of activity (lets say a couple of hours) against this particular db (use profiler filter aggressively to narrow what is captured) and then run the resultant trace against the Database Engine Tuning Advisor. Run the trace against DETA on a dev machine overnight - it takes a lot of resources and can take hours but I have gotten good advice from it. – benjamin moskovits Sep 13 '18 at 14:29

0 Answers0