2

Given a query like

SELECT franchise, MAX(worth)
FROM figurines
GROUP BY franchise

what sort of index would speed this query up, and how would the database use that index?

If more detail is required, assume that the column franchise has a relatively low cardinality and that worth has a very high cardinality.

I personally am using mysql, but I'm looking for a general understanding of the algorithm, not vendor-specific implementation details.

Chris Middleton
  • 5,654
  • 5
  • 31
  • 68
  • 1
    Look here for loose index scan for mysql http://dev.mysql.com/doc/refman/5.1/en/group-by-optimization.html ,extremely efficient.AFAIK as I know all dbs have this except Postgres. – Mihai Apr 08 '15 at 19:31
  • 1
    @Mihai - SQL Server doesn't. Though if the index was on `franchise, worth` it is possible to simulate a skip scanning type behaviour with a recursive CTE. – Martin Smith Apr 08 '15 at 20:02

1 Answers1

1

Scenario 1: no index (read the whole table)

foreach(page in table.pages)
{
  foreach(row in page.rows)
  {
    Compare and accumulate franchise and worth from row
  }
}
-- Total IO = table.pages

Scenario 2: index on franchise only

foreach(page in index.pages)
{
  foreach(indexRow in page.rows)
  {
    tableRow = table.fetchRow(indexRow); // + 1 page of IO for each row
    Compare and accumulate franchise from indexRow and worth from tableRow
  }
}
-- Total IO = index.pages + table.rows
-- this is likely to be greater than Scenario 1...
--  so optimizer should prefer that plan instead.

Scenario 3: covering index (franchise, worth) in that order.

foreach(page in index.pages)
{
  foreach(row in page.rows)
  {
    Compare and accumulate franchise and worth from row
  }
}
-- Total IO = index.pages
-- Assuming that index is thinner than table, a win!

Scenario 4: different query with known list of franchises with index from Scenario 3

foreach(franchise in franchises)
{
  SELECT MAX(worth) FROM figurines WHERE franchise = franchise
}

...

foreach(franchise in franchises)
{
  search into the index looking for the last record with this franchise
  // this is usually less than 10 pages of IO in my experience.
}
-- Total IO = count of franchise * 10
-- super win!

Scenario 4 is different because it invoices a seek and not a scan.

Amy B
  • 108,202
  • 21
  • 135
  • 185