3

There are a number of similar questions on here, but a lot of the answers say to force the use of an index and that doesn't seem to speed anything up for me.

I am wanting to show a "live" counter on my website showing the number of rows in a table. Kind of like how some websites show the number of registered users, or some other statistic, in "real time" (i.e. updated frequently using ajax or websockets).

My table has about 5M rows. It's growing fairly quickly and there is a high volume of inserts and deletes on it. Running

select count(*) from my_table

Takes 1.367 seconds, which is unacceptable because I need my application to get the new row count about once per second.

I tried what many of the answers on here suggest and changed the query to:

select count(*) from my_table use index(my_index)

Where my_index is Normal, BTREE on a bigint field. But the time actually increased to 1.414 seconds.

Why doesn't using an index speed up the query as many answers on here said it would?

Another option some answers suggest is to put a trigger on the table that increments a column in another table. So I could create a stats table and whenever a row is inserted or deleted in my_table have a trigger increment or decrement a column in the stats table. Is this the only other option, since using an index doesn't seem to work?

EDIT: Here's a perfect example of the type of thing I'm trying to accomplish: https://www.freelancer.com. Scroll to the bottom of the page and you'll see:

enter image description here

Those numbers update every second or so.

Nate
  • 26,164
  • 34
  • 130
  • 214
  • 2
    A simple `select count(*) from my_table` is almost certainly impossible to improve with indexes. If it's taking too long you#re down to changing the structure of the table/how it's stored on disk etc. or improving your metal either through optimisations or through buying better. You might also want to consider a different solution, maybe keeping track of the number of inserted and deleted rows and displaying that result of that calculation in your app with a hard-refresh from your main table every N minutes (this is one method - not necessarily the best). – Ben Feb 21 '15 at 15:54
  • Seems like it would be "good enough" to say 36,000 14,521,000, 7,181,000. And recompute those numbers only once a minute. Or maybe "Over 36,453", etc. (I subtracted 1 to be 100% truthful.) – Rick James Feb 21 '15 at 23:17

4 Answers4

1

It takes time to read 5 million records and count them -- whether in an index or in the raw data form.

If a "fast-and-dirty" solution is acceptable, you can use metadata:

SELECT table_rows
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = <whatever> and TABLE_NAME = <whatever2>;

Note that this can get out-of-sync.

Another possibility is to partition the table into smaller chunks. One advantage is that if the inserts and deletes tend to be to one partition, you can just count that and use metadata for the other partitions.

A trigger may or may not help in this situation, depending on the insert/delete load. If you are doing multiple inserts per minute, then a trigger is a no-brainer -- a fine solution. If you are doing dozens or hundreds of changes per second, then the overhead of the trigger might slow down the server.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    Caution: `table_rows` is sometimes as much as a factor of 2 off from the actual count. – Rick James Feb 21 '15 at 16:12
  • "fast-and-dirty" isn't good enough for this, it needs to be fairly accurate (definitely within 1%). I'm not sure how to measure the number of inserts & deletes per second, but if I had to make a guesstimate I'd say 50-100 per second. – Nate Feb 21 '15 at 16:19
  • @Nate . . . If you are doing that many database operations per second, then the number is changing too fast in real-time to actually observe. This means that a slightly out-of-synch number is probably a very good solution. – Gordon Linoff Feb 21 '15 at 21:06
1

If your system is so busy that the counting is having too much impact, then probably the INSERTing/DELETEing is also having impact. One way to improve INSERT/DELETE is to do them in 'batches' instead of one at a time.

Gather the INSERTs, preferably in the app, but optionally in a 'staging' table. Then, once a second (or whatever) copy them into the real table using an INSERT..SELECT, or (if needed) INSERT..ON DUPLICATE KEY UPDATE. DELETEs can go into the same table (with a flag) or a separate table.

The COUNT(*) can be done at the end of the batch. Or it could be dead reckoned (at much lower cost) by knowing what the count was, then adjusting by what the staging table(s) will change it by.

This is a major upheaval to you app code, so don't embark on it unless you have spikes of, say, >100 INSERTs/DELETEs per second. (A steady 100 INSERTs/sec = 3 billion rows per year.)

For more details on "staging table", see http://mysql.rjweb.org/doc.php/staging_table Note that that blog advocates flip-flopping between a pair of staging tables, so as to minimize locks on them, and to allow multiple clients to coexist.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Have a job running in the background that does the following; then use its table for getting the count:

Loop:
    INSERT INTO Counter (ct_my_table)
        SELECT COUNT(*) FROM my_table;
    sleep 1 second
end loop

At worst, it will be a couple of seconds out of date. Also note that INSERTs and DELETEs interfere (read: slow down) the SELECT COUNT(*), hence the "sleep".

Have you noticed that some UIs say "About 120,000 thingies"? They are using even cruder estimations. But it is usually good enough for the users.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Doesn't the fact that the query takes more than a second mean that it's stressing the server? I.e. I don't want to be constantly making a query that bogs the server down. – Nate Feb 21 '15 at 16:22
  • @Nate - Yes, it is stressing the server. No, there is no way to get the _exact_ count from an InnoDB table of this size in less than 1.xx seconds. So, the goal becomes: "Get a good enough answer with little enough impact on the system." If you are happy with "good enough" then use `table_rows` or stale counters. If you are need "less impact", then the loop approach should be tuned (adjust seconds) to loop infrequently enough to have low impact. – Rick James Feb 21 '15 at 18:10
0
  • Take inaccurate value from information_schema as Gordon Linoff suggested
  • Another inaccurate source of rows count is SELECT MAX(id) - MIN(id)
  • Create table my_table_count where you store rows count of table my_table and update it with triggers

In many cases you don't need an accurate value. Who cares if you show 36,400 users instead of the accurate 36,454?

akuzminsky
  • 2,190
  • 15
  • 21