Optimize SQL to get rows count

Question

I have a page on my site that keeps track of the number of people accessing it, on another part I displays the data containing information about the users that access this page, it displays only about 10 at a time.

The problem is I need to create pagination so I need to know how much data is on my table at every time and this causes the display page to take some time to load 2-3 seconds, sometimes 7-10, because I have millions of record. I am wondering, how do I get this page to load faster.

Select COUNT(*) as Count from visits

Do you have auto increment column ? will you delete records from this table ? — Pரதீப், Oct 16 '16 at 13:17
@Prdp I do have auto increment, and I am not deleting data yet, but I will in the future — George, Oct 16 '16 at 13:18
What DB Engine is used? Solution depends if it is MyISAM, InnoDB, or something else. — nempat, Oct 16 '16 at 13:22
If you are not deleting the you can run this `select auto_increment_column from visits order by auto_increment_column desc Limit 1` — Pரதீப், Oct 16 '16 at 13:24
well it is possible to save summary rollup counts at different levels tweaked as they occur. As such no constant recounting is necessary. You would need Intention Locks though. They are nearly instantaneous to get and release the lock for updates of the summary numbers. Something to chew on — Drew, Oct 16 '16 at 13:29
You can create and index on a column of the table, and count by that index. — nempat, Oct 16 '16 at 13:32

score 1 · Answer 1 · answered Oct 16 '16 at 13:30

1

My first response is . . . if you are paging records 10 at a time, why do you need the total count of more than a million?

Second, counting a million rows should not take very long, unless your rows are wide (lots of columns or wide columns). If that is the case, then:

select count(id) from t;

can help, because it will explicitly use an index. Note that the first run may be slower than subsequent runs because of caching.

If you decide that you do need an exact row count, then your only real option for speeding it up using MySQL is to create triggers to maintain the count in another table. However, that will slow down inserts and deletions, which might not be a good idea.

answered Oct 16 '16 at 13:30

Gordon Linoff

1,242,037
58
646
786

`COUNT(id)` cannot be faster than `COUNT(*)`. The former checks each `id` for being `NULL`. And, in InnoDB, it disallows picking a smaller index. – Rick James Oct 17 '16 at 04:13
@RickJames . . . Do you know if `COUNT(*)` will choose the smallest index for the count? My concern is that MySQL might read the data pages for `COUNT(*)` rather than just reading the pages of an index. – Gordon Linoff Oct 17 '16 at 04:19
InnoDB can pick any index for the full-table scan for `COUNT(*)`. And, yes, it picks the "smallest", though I don't know what metric it uses for "smallest". It does fail to take into account whether the smallest index is currently cached; this _may_ be changing with version 8.0. – Rick James Oct 17 '16 at 04:25
If `id` is the `PRIMARY KEY`, then that index is clustered with the data, hence `COUNT(id) forces a full table scan. – Rick James Oct 17 '16 at 04:27
@RickJames . . . The "smallest" would be the fewest number of index pages. I was thinking that if the records are wide, then `count(*)` would typically read all the records. However, `count(othercol)` could get by with just reading the index for `othercol` -- and that would be a cost savings. – Gordon Linoff Oct 17 '16 at 11:25

score 1 · Answer 2 · answered Oct 17 '16 at 04:21

1

The best answer is to say "About 1,234,000 visits", not the exact number. Then calculate it daily (or whatever).

But if you must have the exact count, ...

If this table is "write only", then there is a solution. It involves treating it as a "Fact" table in a Data Warehouse. Then create and maintain a "Summary table" with a row for, say, each hour. Then the COUNT becomes:

SELECT SUM(hourly_count) FROM SummaryTable;

This will be much faster because there is much less to scan. However, there is a problem in that it does not include the count for the last (partial) hour. But that can be solved if you use INSERT ... ON DUPLICATE KEY UPDATE ... to increment the counter for the current hour or insert a new row with a "1".

Some more info is here .

But, before we take this too far, please inform us of how often a new 'visit' occurs.

answered Oct 17 '16 at 04:21

Rick James

135,179
13
127
222

Sorry for the late response to your question, there is at least a visit every second, currently average of 10 visits per second – George Oct 18 '16 at 23:13
10/second = 8M/year. In most people's mind, that is not much different than 5M or 10M. Also, in the time it takes to compute and display the exact count, it will have gone up by 20-30. (These are arguments against working hard to get the exact count.) – Rick James Oct 18 '16 at 23:29
1

Another approach: Once a minute, compute the exact count and store it into a single-cell table for use by the UI. Round to the nearest 1000 when displaying; this will give users a clue that it may not be precise. – Rick James Oct 18 '16 at 23:32

score 0 · Answer 3 · answered Oct 16 '16 at 13:20

0

You cannot make that query get faster without changing the server's hardware or adding more servers to run it in parallel. In the second case it would be better to move to a nosql database.

My approach would be to reduce the number of records. That you could do by having some temporary table where you record the access logs for the past hour/day and after that time run a cronjob that deletes the data, or moves it to another table for log term storage.

answered Oct 16 '16 at 13:20

Mihai Popescu

386
1
11

I need all most realtime feedback on users activities, I cannot use a cronjob. Changing databases at this time might not be the best option for us. – George Oct 16 '16 at 13:23

score 0 · Answer 4 · answered Oct 16 '16 at 13:54

0

You usually do not need to know exact number of rows for pagination

SELECT COUNT(*) FROM
(SELECT TOP 10000 * FROM visits) as v

would tell You, that there are at least 1000 pages. In most cases You do not need to know more.

You can store total count somewhere and update it from time to time if You want some reasonable estimate. If You need exact number, You can use trigger to keep it actual. The more up to date info, the more expensive, of course.

answered Oct 16 '16 at 13:54

Antonín Lejsek

6,003
2
16
18

That smells like SQL Server syntax, not MySQL. – Rick James Oct 17 '16 at 04:10
You are right, too sad this is not unified. Imagine LIMIT instead. – Antonín Lejsek Oct 17 '16 at 22:48

score 0 · Answer 5 · answered Oct 18 '16 at 13:59

Decide on limit (let's say, 1000 last ones) from practical (business requirements) point of view. Have auto_increment index (id) or timestamp (createdon). Grab max 1000 records

select count(*) from (select id from visits order by id desc limit 1000)

or grab all 1000 and count paginate on the client side (php) (as if you paginate mysql will still go through those records):

select * from visits order by id desc limit 1000

Optimize SQL to get rows count

5 Answers5