Providing fast `select count(*)` functionality in a web app

Question

I'm re-implementing an app to support a national engineering contest, moving it from a local server to the cloud.

In order to tell a team where they stand at the moment, the query has the form

select 1 + count(*) from team where where score < ?

Teams scores change very dynamically. There may be up to 2 million teams and I need to handle at least 10 of these queries per second.

The original gets the needed performance (actually it already did with 1999 harware) by using a separate Berkeley DB of team/scores records. There is a "record number" feature in Berkeley DB that provides exactly the right functionality, and it's very fast.

Heroku apparently has no way to support Berkeley DB. PostgreSQL, their standard DB, does select count(*) with a full table or index scan, which is way too slow.

Any ideas on how to proceed? I am not wedded to Heroku, but must move to a cloud solution of some kind.

Clodoaldo Neto · Answer 1 · 2012-10-08T01:57:36.323

2

Create a rank table and update as frequently as adequate. Include the category (open or official) and score so you don't have to join it to the team table at query time:

create table "rank" (
    team integer primary key, 
    category integer,
    score integer,
    rank_consolidated integer,
    rank_category integer
);

begin;
truncate table "rank"
;
insert into "rank" (team, category, score, rank_consolidated, rank_category)
select 
    team, category, score,
    rank() over(order by score desc) rank_consolidated,
    rank() over(partition by category order by score desc) rank_category
from team
;
commit
;
select * from "rank" where team = 11;

As for the exact ranking behavior look into window functions

edited Oct 08 '12 at 01:57

answered Oct 08 '12 at 01:11

Clodoaldo Neto

118,695
26
233
260

I appreciate the response but I have a real time solution (no background processing) now, and there are many advantages to keeping it that way. Will fall back to this if no other answer appears. – Gene Oct 08 '12 at 17:44
@Gene Adding, deleting or updating a member to an ordered set causes the index to be updated which has an associated cost. That will happen in both Redis and Postgresql. All binary search is O(log n). You can do a ranking in postgresql just as fast, if not faster, as in Redis. I would be very surprised if Redis index implementation is any faster than Postgresql's btree. I suggested the background processing just to make it extremely fast. If you do the rank in real time it will be as fast or faster than Redis. If the number of teams ever gets to 2 million the difference will show. – Clodoaldo Neto Oct 08 '12 at 18:13
1

Rdis ZRANK computes the order statistc (rank) of a key in O(log N). To do this it uses leaf counts in each B-tree node. (look up `order statistic tree`) Postgresql, MySQL, MSQL, and Sybase, which I have checked personally, require O(N) to get the rank. They don't maintain leaf counts because these cause bad locking behavior in highly parallel applications. So to get an order statistic (count(*) where x), they each scan either the entire index or the entire table. – Gene Oct 08 '12 at 19:33
Sorry I should have said Skip List node, which is the Redis ordered set data structure. Skip lists are related to B-trees and also allow fast maintenance and retrieval of order stats. – Gene Oct 08 '12 at 19:45

score 2 · Accepted Answer · edited May 10 '13 at 02:09

2

Use redis to store your team data in a sorted set. Then the ZRANK function will return the count you need. Redis is very fast in general, and the ZRANK function is O(log N) expected. It is implemented with skip lists.

edited May 10 '13 at 02:09

Gene

46,253
4
58
96

answered Oct 08 '12 at 14:00

hgmnz

13,208
4
37
41

Thanks. I think redis is waht I need, but I am going to edit your answer to express how it actually must be used to solve my problem. Thanks for the pointer! – Gene Oct 08 '12 at 16:25
Sounds good to me! sorted sets work, but I would've used INCR as originally posted. It's not uncommon to model your data in redis such that you have one redis key per team, for instance: INCR team:, GET team:, etc – hgmnz Oct 09 '12 at 19:54
I can't see how to solve the problem with INCR. I need to store up to 10^6 pairs and then be able to query "How many teams have score less than X?" The pairs are constantly changing while these queries are occurring. – Gene Oct 10 '12 at 01:24
Ahhhh. Yes, I misunderstood what you were solving (i thought it was merely keeping a counter around for whatever reason). Completely agree with your approach! – hgmnz Oct 10 '12 at 18:13

Marlin Pierce · Answer 3 · 2012-10-08T00:54:59.077

0

Putting an index on score should avoid the full table scan.

edited Oct 08 '12 at 00:54

answered Oct 07 '12 at 15:16

Marlin Pierce

9,931
4
30
52

Yes, but the OP has already stated that an index scan is too slow. – dbenham Oct 07 '12 at 15:27
It still scans the index, and that's too slow. I'm pretty sure something that's O(log n) peformance is needed. That's what Berkeley DB provides. – Gene Oct 07 '12 at 16:27
The hope was that PostgresSQL might keep metadata in the index, like the position number, or how many nodes are under each branch of the b-tree. – Marlin Pierce Oct 07 '12 at 18:22
@Gene In PostgreSQL 9.1 and older it scans the table contents, too, because of visibility requirements. Only in PostgreSQL 9.2 and above are index-only scans (which don't touch the main table) supported. – Craig Ringer Oct 08 '12 at 08:06
@MarlinPierce It *does* keep some of that info, but not information about how many of those nodes are *visible to the current transaction*, so it has to look at the table contents to get the xmax/xmin/ctid etc and work out whether to count that particular tuple. Consider Pg's MVCC design. – Craig Ringer Oct 08 '12 at 08:07
Thanks. The point is that even an index scan is too slow with 10^6 records. Berkeley DB keeps counts on BTree nodes that allow the rank of a node to be found as a query for the node's key is being processed. So it's only about `log_b N` node accesses where N=10^6 in this case, and b=node branch factor. It hits max 8 nodes in any query, avoiding a scan of 10^6 records. General purpose SQL DBs dont keep the required counts because they mean (more or less) that the entire DB must be locked during updates, which slows down multi-thhreaded access. In my app multi-threads are not important. – Gene Oct 08 '12 at 16:22
RDBMS use BTrees as indexes, and might keep track of the number of nodes along each branch. It does not require locking the entire database or even locking the table to keep this number up to date. The number can be updated as the index changes, when records are added, deleted, or the value of the field (score) in the index changes. With such data, the index could be used in a log_b(N) order by searching the index and added all BTree branches where score is less than the given value, thus avoiding a full index scan. It depends upon the implementation of the database product. – Marlin Pierce Oct 08 '12 at 16:53
From @Gene's comment, if not for the visibility requirements, the data is there and could be used if that was the implementation for count. – Marlin Pierce Oct 08 '12 at 16:56

Craig Ringer · Answer 4 · 2012-10-08T08:19:54.620

If it's read from a lot more heavily than it's written to and it always has to be up to date, then this is an ideal job for a trigger-maintained summary table (sort of a materialized view).

Have a trigger on the team table that, AFTER EACH INSERT OR UPDATE OR DELETE FOR EACH ROW executes a trigger function that updates the team_summary table entry for that team with the new score.

The team_summary table can be accessed via a simple, direct index lookup by equality so it'll be crazy fast. Since Pg supports simultaneous readers and writers the team_summary table will remain responsive even if it's being updated very frequently. The only thing you really need to do for best results is set FILLFACTOR to something like 50 in the team_summary table so that HOT can work well, and make sure autovacuum is set to run quite often to spread the load of the vacuum I/O churn.

Writing the trigger should be pretty trivial. You just have to be careful to write a concurrency-safe trigger that won't break when you have concurrent updates to the same team by multiple concurrent connections. Something like:

UPDATE team_summary SET score = score + 1 WHERE team_id = NEW.team_id;

should be fine under both SERIALIZABLE and READ COMMITTED isolation. See Concurrency control. The only hard bit is that you must make sure to insert a new row into team_summary before inserting the first row for a new team into team so your trigger doesn't have to handle the surprisingly tricky case where the team_summary row might not yet exist in the team table. Getting the upsert/merge right for that is kind of tricky.

If the write rate is also very high and you can get away with the results being updated only every few seconds/minutes, use Clodoaldo's approach instead.

Providing fast `select count(*)` functionality in a web app

4 Answers4