Inserting data without duplicates

Question

I'm trying to insert some data without duplicates, possibly from 2+ concurrent processes.

Unfortunately I cannot use unique constraint in this case due to the database design (deleted rows are marked with deleted=1 and those can exist as duplicates).

It seems that a simple transaction wouldn't work - the best I could come up with is SELECT ... FOR UPDATE, but that's not enough - if no rows exist yet, none will be locked, so it doesn't prevent insertion. On the other hand I'd like to avoid locking the whole table for writing.

Is there some nice ways to work around this problem? The table engine is InnoDB. (secondary question is - how to make it work in sqlalchemy, but I can translate the solution if it works in general)

Edit: you can assume schema:

deleted tinyint(1) default null,
id int(11) not null auto_increment,
address varchar(255) default null,
...

where address is supposed to be unique for entries where deleted == 0.

have you looked at `INSERT ... ON DUPLICATE KEY UPDATE` syntax? http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html — Kermit, Aug 22 '12 at 16:37
@njk There are no key conflicts, so I don't think this will work. I need to really abort that insert if it would lead to possible duplicates. — viraptor, Aug 22 '12 at 16:41
You said you have tried a transaction? Have you tried it with locks? — Kermit, Aug 22 '12 at 16:49

score 0 · Answer 1 · answered Aug 22 '12 at 16:36

0

Along those lines:

insert into target
  select * from source1
  union
  (select * from source2 where not (source2.id in (select id from source1)))

Add more union clauses for more tables.

answered Aug 22 '12 at 16:36

Alex Monthy

1,827
1
14
24

My scenario is two processes doing `insert into the_table (deleted, address) values (0, '123')`, rather than inserting from two sources in one process. – viraptor Aug 22 '12 at 16:42

piotrm · Answer 2 · 2012-08-22T17:42:02.517

You can try this insert query:

INSERT IGNORE INTO tbl(id,deleted,address)
SELECT CASE WHEN EXISTS (SELECT id FROM tbl 
                         WHERE deleted=0 AND address='new_address')
       THEN id ELSE NULL END,
       0,
       'new_address'
FROM tbl
LIMIT 1

If the row with given address and deleted=0 already exists in your table it will try to insert a row with the same id, which obviously won't happen as id is a primary key. But if there is no such row it will try to insert a row with NULL as id, which will succeed.

Inserting data without duplicates

2 Answers2