im trying to create a sql query, that will detect (possible) duplicate customers in my database:
I have two tables:
- Customer with the columns: cid, firstname, lastname, zip. Note that cid is the unique customer id and primary key for this table.
- IgnoreForDuplicateCustomer with the columns: cid1, cid2. Both columns are foreign keys, which references to Customer(cid). This table is used to say, that the customer with cid1 is not the same as the customer with the cid2.
So for example, if i have
- a Customer entry with cid = 1, firstname="foo", lastname="anonymous" and zip="11231"
- and another Customer entry with cid=2, firstname="foo", lastname="anonymous" and zip="11231".
So my sql query should search for customers, that have the same firstname, lastname and zip and the detect that customer with cid = 1 is the same as customer with cid = 2.
However, it should be possible to say, that customer cid = 1 and cid=2 are not the same, by storing a new entry in the IgnoreForDuplicateCustomer table by setting cid1 = 1 and cid2 = 2.
So detecting the duplicate customers work well with this sql query script:
SELECT cid, firstname, lastname, zip, COUNT(*) AS NumOccurrences
FROM Customer
GROUP BY fistname, lastname,zip
HAVING ( COUNT(*) > 1 )
My problem is, that i am not able, to integrate the IgnoreForDuplicateCustomer table, to that like in my previous example the customer with cid = 1 and cid=2 will not be marked / queried as the same, since there is an entry/rule in the IgnoreForDuplicateCustomer table.
So i tried to extend my previous query by adding a where clause:
SELECT cid, firstname, lastname, COUNT(*) AS NumOccurrences
FROM Customer
WHERE cid NOT IN (
SELECT cid1 FROM IgnoreForDuplicateCustomer WHERE cid2=cid
UNION
SELECT cid2 FROM IgnoreForDuplicateCustomer WHERE cid1=cid
)
GROUP BY firstname, lastname, zip
HAVING ( COUNT(*) > 1 )
Unfortunately this additional WHERE clause has absolutely no impact on my result. Any suggestions?