2

Possible Duplicate:
How does SQL server work out the estimated number of rows?

I was just looking through SQL Server Graphical Execution plan and I encountered the following two information:

  1. actual number of rows-2385
  2. estimated number of rows-180

I was wondering why this difference was coming. Can you explain me what is the difference between them. It will be helpful if you can explain with some example.

Thanks.

Community
  • 1
  • 1
Rocky Singh
  • 15,128
  • 29
  • 99
  • 146

2 Answers2

6

Row estimates are based on statistics. Inaccurate estimates can come from a number of factors:

  • Low cardinality in your data or stats
  • Out of date or incomplete stats
  • Inefficient execution plan
  • Concurrency issues (changes in data from creation of execution plan to execution time)

As a rule, don't worry about it until you have issues.

I my experience, it's most often based on the cardinality in your stats.

If you are selecting based on two fields, both of which have indexes, the row estimate will be based on a product of the likelihood of the values in the respective indexes, times the total number of rows.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • +1. Your excellent advice to not worry about things unless they become problems is essential. If you add some details on proper stats maintenance (I personally want to know more), this would be a perfect answer – Michael Haren Aug 22 '11 at 15:57
  • @michael - that part isn't too complicated - make sure `AUTO UPDATE STATISTICS` and `AUTO CREATE STATISTICS` are on, and if you have a very busy table it's good to do a manual update with `FULLSCAN` during maintenance. – JNK Aug 22 '11 at 15:59
2

I don't know SQL Server too much, so this is an educated guess: There is a discrepancy between the statistical data kept about the table and the actual data in the table.

Databases keep statistical data about tables, which can be used by the optimizer to find an optimal (least expensive) query plan for the SQL that you want to execute. Those statistics include for instance the total number of rows and the distribution of data across columns (for instance that the field 'gender' contains 60% values 'm' and 40% values 'f'). What your likely seeing here, is that the statistics are outdated and the calculations done by the optimizer hence inacurate: The optimizer assumed the result to contain 180 rows, in reality though (when executing the query) it returns 2385. Try updating the statistics (don't know the exact syntax in SQL server) and see if that changes the numbers.

Janick Bernet
  • 20,544
  • 2
  • 29
  • 55
  • Can you explain with some example where it can be different? – Rocky Singh Aug 22 '11 at 15:52
  • @Rocky if you have status that are out of date--i.e. were last computed when you had few rows but now have many rows--this can happen. You can set stats to automatically recompute (though I'm not sure how that works in practice), or you can create a maintenance plan that recomputes them on a schedule (what I normally do). – Michael Haren Aug 22 '11 at 15:54
  • @Rocky: I redid my explanations a bit, I hope its clearer now. – Janick Bernet Aug 22 '11 at 15:56