1

There's an interesting design decision I've been thinking about lately. Let's say I'm adding usernames to a table, and I want to make sure there are no duplicates. The username column is NOT NULL UNIQUE. I could either:

  1. Query the database before inserting to make sure there are no duplicate names, or
  2. Just INSERT, and catch any exceptions that come from the database engine.

Assuming the DB I'm using is capable of enforcing constraints, I was wondering what situations each of these choices is appropriate in.

zildjohn01
  • 11,339
  • 6
  • 52
  • 58

3 Answers3

1

It almost always seems like a good idea to do option 2. I wouldn't recommend option 1 because you've effectively doubled the amount of time required to do inserts (they all require reads first). Besides, some new developer is going to just commit sometime and not to the check, and it will get broken.

Another thing to consider is how much downtime is appropriate? Is this a mission critical app? What happens if the business logic is corrupt? Will factories shut down if it is? Or will it just be some annoying bugs.

You can't afford to have your factories shut down because some exception you didn't think of crashed your server. So, perhaps a nightly or weekly check on the data correctness can also help in this case. However, I feel the DB capabilities to enforce uniqueness (and potentially other enforcements) are the appropriate way to go.

corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • I updated my answer with a quick demo of how much more expensive it is to catch an exception than to do a query before the insert. Catching an exception does far more than double the time required to do an insert at least in Oracle. – Justin Cave Feb 18 '11 at 15:32
1

Can you cache the username list and check it on the application side without going to the database? You should still have the unique constraint on the database to ensure no bad data gets in (always protect the data at the database level first and foremost) but if you can do the check from a cache, you could save a whole round trip to the database when someone selects the same username as an existing user. Now this may depend o nthe size of the data you would need to cache and how often the cache would have to be updated. Not knowing your system, I can't say if it is practical, but I would at least look into doing it.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
0

Do you expect that the new username is likely to be unique? Or is it likely that it will be a duplicate? If the username is likely to be unique, doing the insert and catching the exception would be more efficient. If the username is likely to be a duplicate, it will be more efficient to check for duplicates (and potentially look for a similar but not yet taken username) rather than trying to catch the exception. Obviously different databases and different versions of those databases have a different breakeven point on the relative probabilities. But in general, if you're building a system for a company where everyone has a unique username anyway, do the insert and catch the exception. If you're building Hotmail, check for duplicates first.

A quick demo (on Oracle 11.2.0.1) shows that it's roughly 7 times as expensive to do an insert that fails and to handle the exception than it is to do a check before the insert and then write the data.

SQL> create table username_test (
  2    username varchar2(30) unique
  3  );

Table created.

SQL> set timing on;

SQL> ed
Wrote file afiedt.buf

  1  declare
  2    l_cnt integer;
  3  begin
  4    for i in 1 .. 100000
  5    loop
  6      select count(*)
  7        into l_cnt
  8        from username_test
  9       where username = 'JCAVE';
 10      if( l_cnt = 0 )
 11      then
 12        insert into username_test( username )
 13          values( 'JCAVE' );
 14      end if;
 15    end loop;
 16* end;
SQL> /

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.20
SQL> rollback;

Rollback complete.

Elapsed: 00:00:00.00

SQL> ed
Wrote file afiedt.buf

  1  declare
  2    l_cnt integer;
  3  begin
  4    for i in 1 .. 100000
  5    loop
  6      begin
  7       insert into username_test( username )
  8          values( 'JCAVE' );
  9      exception
 10        when dup_val_on_index then
 11          null;
 12      end;
 13    end loop;
 14* end;
SQL> /

PL/SQL procedure successfully completed.

Elapsed: 00:00:29.58
Justin Cave
  • 227,342
  • 24
  • 367
  • 384
  • Are you implying it's faster to do two queries, one that checks and one that inserts, than it is to simply do the insert and see if it fails? I would have to see a benchmark that shows that, because it doesn't seem logical. – corsiKa Feb 18 '11 at 15:21
  • @glowcoder - Yes, I am. Remember that exception handling is expensive-- it is supposed to be used when something unexpected happens. A single-row check is generally going to be cheaper than handling an exception. – Justin Cave Feb 18 '11 at 15:24
  • 1
    But the exception would be the rare case, the more common case of the insert working woudl be slower. – HLGEM Feb 18 '11 at 15:31
  • @Justin I would agree that exception handling is more expensive than your typical line of code, but like I said I would want to see a benchmark that says it's more expensive than an entire second query before I believe it. I would also disagree with the idea that exceptions are unexpected. Exceptions exist to handle when the input doesn't conform to the business logic, which is exactly what this describes. Unless there's an identified performance blocker that says "that exception handling grinds our system", then it's a premature optimization to subvert it with a preemptive query. – corsiKa Feb 18 '11 at 15:31
  • I think we all agree that it depends on how likely a username is to be a duplicate-- the single company app where everyone already has a unique company-wide username is obviously different than a public internet app that is going to have millions of users where you're likely to have to try 3 or 4 usernames before you find one that isn't taken. – Justin Cave Feb 18 '11 at 15:36
  • @Justin First, I commend actually putting in your benchmark. This establishes that it is 7x slower to insert and fail than to insert and succeed. But, let's also consider that all 100k failed inserts took less than 30 seconds. That's still only 0.0003 sec per failure. Chances are, you're going to spend more time than that in transactional overhead getting the result back to your application. That overhead would be doubled if you didn't use exceptions, which, imo, shows you should be using exceptions over the second query in all cases. – corsiKa Feb 18 '11 at 15:40