2

I have more than 10 million rows. Counting the number of rows is too slow. I want to estimate the number of rows.

Current query is like this:

SELECT count(*) FROM `database` WHERE `location` LIKE '%New York%'

This gives me the exact number and its too slow. What I want is something like this:

ESTIMATE count(*) FROM `database` WHERE `location` LIKE '%New York%'

Maybe this can be done by limiting the query to a subset of the table (every nth row or something). Maybe there is a built-in function that can be used.

Edit: Perhaps I can limit the query to every nth row? and then multiple by n.

Edit2: I just tried counting every 10th row and then multiplying the count by ten, but there was no increase in speed, despite the fact that the auto-incrementing 'counter' is a primary key.

SELECT count(*) FROM database WHERE `location` LIKE '%Los Angeles%' and counter%10 = 0
Ned Hulton
  • 477
  • 3
  • 12
  • 27
  • 1
    there isn't such thing. you can take a look at https://stackoverflow.com/questions/4400451/optimize-mysql-count-query but if you have a auto_incrm3ent why not use the max as estimate – nbk Mar 27 '22 at 22:50
  • Right, I can simply add "WHERE COUNTER > 100 and COUNTER < 200" to get a sample of 100 rows. The problem with that is that I would be focusing on a particular part of the table which may not be representative. Is there a way to get every nth row? – Ned Hulton Mar 27 '22 at 22:59
  • If that is what you really need on a regular basis, run a counter with after triggers, it costs some time but you have quick an answer, count are always bad, and you must consider if you needed it – nbk Mar 27 '22 at 23:05

2 Answers2

3

The performance of COUNT(*) is not the biggest performance problem in this case. The performance cannot be improved, because your search condition LIKE '%...%' is bound to run a table-scan. It must examine every row in the table anyway, to check if it matches your condition.

Alternatives to make it quicker:

  • Match the pattern using fulltext indexing instead of LIKE predicates with wildcards

  • Store the count and keep it updated as you add or change data, as the comment above suggests (but you'd need a separate counter for each possible search pattern)

  • Get a more powerful computer

  • Store less data

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • In some RDBMS products creating an index causes the system to keep/find number of records from the index structure very easily; would MySql have something like that (of course, after removing LIKE '%..') – tinazmu Mar 28 '22 at 00:47
  • 1
    Yes, values in an index are persistent, but MySQL does not have indexes that can store aggregated calculations like `COUNT(*)`. – Bill Karwin Mar 28 '22 at 01:52
  • 1
    MySQL's outdated MyISAM storage engine keeps a record of the total number of rows in the table, which makes `SELECT COUNT(*)` fast, _provided your query have no condition in the WHERE clause._ But InnoDB (the current default storage engine) does not support this, because the count of rows in a table varies due to transactions. – Bill Karwin Mar 28 '22 at 01:54
1

This would run immensely faster if you have INDEX(city):

WHERE `city` = New York'

This would run nearly as fast as that:

WHERE `city` = New York%'

If you currently have location as a combination of address, city, state, zip, then you may need to rethink the choice of having location in a single column. Or go with a FULLTEXT index, as Karwin suggests.

Rick James
  • 135,179
  • 13
  • 127
  • 222