Counting rows before proceeding to actual searching

Question

Given an web app (Java, Spring, Hibernate and Sybase as DB) with several, say 5 different search screens, I want to count first if the search result based on the user's criteria will be exceeding a limit, say 1000 rows. Results that are huge, going past 1000 can happen even if user provides reasonable filters and criteria.

Is doing it this way recommended:

select count(*) from table --clauses, etc here
then if > 1000, don't do actual search, return and show limit error (tell user to refine search)
else if < 1000, do the actual search and give back the resultset to user

Or is there a better solution to handle this?

If this is the way to go, my followup question would be, how can we avoid duplicating the sql query? Because I understand doing this, will require me to declare the same search sql except the select clause will only contain count(*).

UPDATES

Additionally, I want to avoid 2 things: 1. processing from executing the actual sql 2. loading/mapping of the domain objects by the ORM (Hibernate in this case) * both 1 & 2 are avoided when I detect that the count is > 1000.

You're likely not avoiding (1) when do you a COUNT first. Only thing you're avoiding is sending the results back from the DB. Guess on (2) is that there's no way around this if you want to use the ORM unless you want to have two queries. — dfb, Jul 25 '12 at 05:12
In some cases, yes. There will be some additional I/O overhead to retrieve a row vs. count it, but the query can't know the count without all the things up to the limit. — dfb, Jul 25 '12 at 05:23

score 2 · Accepted Answer · answered Jul 25 '12 at 05:03

2

I wouldn't run a COUNT(*) at all, just run the query with a LIMIT 1001. It's likely you are generating the exact same result set (i.e., to do the COUNT, you have to generate the result set) in the count and the next hit will be from the cache, or at worst you'll have to recalculate. You're just doing the same work twice

answered Jul 25 '12 at 05:03

dfb

13,133
2
31
52

i see, for a while there, i thought doing the COUNT version was not as expensive as executing the actual sql... – Carlos Jaime C. De Leon Jul 25 '12 at 05:24
It might save you a bit of disk I/O, but you're also making two trips to the database. I don't have any definitive statistics to reference, but in my experience this isn't likely to be a big bottleneck – dfb Jul 25 '12 at 05:26
Ok, will doing select top 1000 * from table be same as limit? I forgot to mention we are using Sybase. – Carlos Jaime C. De Leon Jul 25 '12 at 05:30

score 0 · Answer 2 · answered Jul 25 '12 at 05:01

We followed the same procedure for our application as well. and Yes the only difference will be of placing count(1) instead of * in the SQL.

However you might need to understand that on occasions the Count Query is the one which takes more time then fetching a subset of results.

score 0 · Answer 3 · answered Jul 25 '12 at 05:01

0

Depending on how you are retrieving the rows from the row set, you could simply filter the results at that level.

ie

int rowIndex = 0;
while (rs.hasNext() && rowIndex < 1000) {
    // ... Extract results
    rowIndex++;
}

You may want to warn the user that there result set has been trimmed though ;)

answered Jul 25 '12 at 05:01

MadProgrammer

343,457
22
230
366

thanks for the comment, however, i want to avoid 2 things: executing the actual query and the loading/mapping to our domain object (which the orm does for us) which can consume large space in heap though – Carlos Jaime C. De Leon Jul 25 '12 at 05:05
No worries, was just a wild thought ;) – MadProgrammer Jul 25 '12 at 05:07

Counting rows before proceeding to actual searching

3 Answers3

Linked