Without a suitable index, the query will do a table scan. Since reading the rows is the dominant part of the execution time (in many cases); the variations you mention do not matter.
If you do have a relevant index, and that index is selective enough (10% is likely to be "selective enough"), there will be two steps to the query:
- Scan part of the index, which is a separate BTree.
- For each row, get the
PRIMARY KEY
(assuming you are using InnoDB) from that BTree. With that PK, look up the row in the main BTree, which contains the PK and the Data.
If all the necessary blocks are cached in the buffer_pool (again, assuming InnoDB), the variations in cost are rather minor.
If not all blocks are in cache (because mysqld just started up, or because the index/data is too big to remain cached), then you are into "counting the disk hits". This is because the dominant part of the "cost" is I/O. Now computing the cost is quite complex because of needing go know what percentage is already cached, whether the query will 'thrash' the cache, whether the 10% are evenly scattered, or clumped together, or something in between.
Since (for InnoDB), the PK is "clustered" with the data, a lookup by PK acts differently than a lookup by a secondary key.
10K rows is "small". 10 buffer pages -- what do you mean? "all are string fields of the same length" -- unrealistic and not a good idea to use CHAR
instead of VARCHAR
. Anyway, the string lengths have very little bearing on this discussion.
WHERE E.title=‘Administrator’ AND E.dname=‘Finance’
-- begs for INDEX(title, dname)
in either order.
A "Rule of Thumb": One block (InnoDB) can hold 100 rows (of data or index). (This, of course, can vary wildly. But it is sometimes handy for "counting the disk hits".)
In my cookbook, I find it easier to focus on designing the 'best' index, without computing 'cost'.
Further notes on queries
"Assume that only 10% of Employee tuples meet the condition E.title =’Administrator’, only 10% meet E.dname =’Finance’ and only 5% meet both conditions." In the case of MySQL, here are more details:
Case 1: INDEX(title)
-- similar to first query -- index range scan of 10%, then probes into data.
Case 2: INDEX(dname)
-- ditto.
Case 3: Both indexes -- there is a slim chance of using "index merge intersect" to do two index "range scans", merge the two sets together, then reach into the data for the rows.
Case 4 (the best): INDEX(title, dname)
(or opposite order): Back to an index range scan, but limited to only 5% of the items.
MySQL's preferred Engine is InnoDB. What I have discussed assumes that, not MyISAM. With InnoDB, "Data" is stored in a B+Tree, so is every secondary index. Keep this similarity in mind when thinking through how a query is performed. Note also that the 'leaf nodes' of a secondary index contain the PK, thereby providing the mechanism for finding the rest of the record.