0

I have a very big table with a lot of rows and a lot of columns (I know it's bad but let's leave this aside).

Specifically, I had two columns - FinishTime, JobId. The first one is the finish time of the row and the second is its id (not unique, but almost unique - only few records exist with the same id).

I have index on jobid and index on finishtime.

We insert rows all the time, mostly ordered by the finish time. We also update statistics of each index periodically.

Now to the problem:

When I run query with filter jobid==<some id> AND finishtime > <now minus 1 hour> - this query takes a lot of time, and when showing the estimated execution plan I see that the plan is to go over the finishtime index, even though going over the jobid index should be a lot better. When looking at the index statistics, I see that the server "thinks" that the number of jobs in the last hour is 1 because we didn't update the statistics of this index.

When I run query with filter jobid==<some id> AND finishtime > <now minus 100 days> - this works great, because the SQL Server knows to go over the correct index - the job id index.

So basically my question is why if we don't update index statistics all the time (which is time consuming), the server assumes that the number of records past the last bucket is 1?

Thanks very much

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
duduamar
  • 3,816
  • 7
  • 35
  • 54

1 Answers1

1

You can get a histogram of what the statistics contains for an index using DBCC SHOW_STATISTICS, e.g.

DBCC SHOW_STATISTICS ( mytablename , myindexname ) 

For date-based records, queries will always be prone to incorrect statistics. Running this should show that the last bucket in the histogram has barely any records in the range [prior-to-today / after-today]. However, all else being equal, SQL Server should still prefer the job_id index to the finishtime index if both are single-column indexes with no included columns; this is due to job_id (int) being faster to lookup than finishtime (datetime).

Note: If your finishtime is covering for the query, this will heavily influence the query optimizer in selecting it since it eliminates a bookmark lookup operation.

To combat this, either

  1. update statistics regularly
  2. create multiple filtered indexes (2008+ feature) on the data, with one partition updated far more regularly being the tail end
  3. use index hints on sensitive queries
RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • Well you explained the issue, but I don't understand what can I do about it? I have instructed my users to user very old finish time when they query for specific job id at the moment – duduamar Dec 06 '12 at 07:48
  • 1
    If you know the main SPs that use the date filter in tandem with job id, add an index hint – RichardTheKiwi Dec 06 '12 at 08:25
  • why does the SQL server thinks the number of rows past the last bucket is 1 and not for example average of buckets? I consider this to be some kind of bug. Anyway, helpful tips. Thanks! – duduamar Dec 06 '12 at 10:16