According to this:
Selectivity is the value between 0 and 1, and it is the fraction of rows returned after applying a filter on the table. For example if a table has 10,000 rows and the query returns 2601 rows, the selectivity would be 2601/10000 or .26 or 26 percent. Selectivity enables you (or optimizer for that matter) to decide which data access method is optimum in the execution plan.
I need some clarifications: ok, that table has 10000 rows and only 2601 are returned by the query. But what if, that query block contains three joined tables, or it contains a subquery in the where clause? So in the FROM clause there are three tables, and the fourth table is part of a where clause subquery, then how is this selectivity calculated?
Selectivity = number of rows satisfying a condition (from which table?) / total number of rows (from all the four tables?)
Same question for cardinality (cardinality = selectivity * total number of rows).
I found many articles about this, but each of them exemplifies these concepts with simple select statements, based on a single table or a single where clause condition.
Can someone give me an example of how are these measures calculated in case of a bit more complex query (on "hr" schema, or other training purpose schema), meaning subqueries in the FROM clause, or WHERE clause, and so on?
Thank you.
EDIT: I need some clarification about the selectivity measure, computed by the Estimator (Cost-Based Optimizer). http://gerardnico.com/wiki/database/oracle/selectivity For example, for an equality predicate (last_name = 'Smith'), selectivity is set to the reciprocal of the number n of distinct values of last_name, because the query selects rows that all contain one out of n distinct values.
I don't know how to understand that "reciprocal of the number n of distinct values".
Assuming the employees table have 107 rows, and the query Code: [Select all] [Show/ hide]
select * from employees where last_name = 'Smith'
returns 2 rows, the selectivity is 2/107 = 0.01? So it's the number of rows satisfying the predicate / total number of rows. So no "distinct" involved in this equation.
Apart from this selectivity of the statement, there is also a column selectivity, which is represented by the NDV (number of distinct values in that column - which can be queried from dba_tab_col_statistics) / total number of rows (http://www.runningoracle.com/product_info.php?products_id=233). So if NDV is 103, the selectivity of the last_name column is 103/107 = 0,96.
This is what I understood.. is this correct? If I'm wrong, please correct me.
Thank you.