search for cross-field duplicates in postgresql

Question

I have a table of contacts. The table contains a mobile_phone column as well as a home_phone column. I'd like to fetch all duplicate contacts where a duplicate is two contacts sharing a phone number. If contact A's mobile_phone matches contact B's home_phone, this is also a duplicate. Here is an example of three contacts that should match.

contact_id|mobile_phone|home_phone|other columns such as email.......|...
-------------------------------------------------------------------------
111       |9748777777  |1112312312|..................................|...
112       |1112312312  |null      |..................................|...
113       |9748777777  |0001112222|..................................|...

Here is a query that finds duplicates, just not across fields.

select mobile_phone from contacts group by mobile_phone HAVING count(*) > 1 order by mobile_phone

Do you want to find duplicates (the involved rows) or just the shared phone numbers? — Erwin Brandstetter, Aug 06 '20 at 02:41

score 0 · Answer 1 · answered Aug 06 '20 at 01:17

0

Use window functions:

select c.*
from (select c.*, count(*) over (partition by mobile_phone) as cnt
      from contacts c
     ) c
where cnt > 1;

answered Aug 06 '20 at 01:17

Gordon Linoff

1,242,037
58
646
786

I need to handle cross-field duplicates – GNG Aug 06 '20 at 01:24
@GNG . . . That is not the question you asked here: "This returns all mobile_phone numbers that appear in multiple rows. I ultimately need the entire row's contents." If you have a question like that, ask a *new* question with appropriate sample data and desired results. – Gordon Linoff Aug 06 '20 at 01:26
Thanks. I included that as a detail to provide context/motivation for the question. – GNG Aug 06 '20 at 01:30

Erwin Brandstetter · Accepted Answer · 2020-08-06T02:59:01.860

0

To find all rows with (cross-column) duplicate phone numbers:

SELECT *
FROM   contacts c
WHERE  EXISTS (
   SELECT FROM contacts x
   WHERE  x.mobile_phone IN (c.mobile_phone, c.home_phone)
       OR x.home_phone   IN (c.mobile_phone, c.home_phone)
   AND x.contact_id <> c.contact_id  -- except self
   );

To find all duplicative phone numbers across the two columns:

SELECT DISTINCT phone
FROM  (
   SELECT mobile_phone AS phone
   FROM   contacts c
   WHERE  EXISTS (
      SELECT FROM mobile_phone x
      WHERE  c.mobile_phone IN (x.mobile_phone, x.home_phone)
      AND    c.contact_id <> x.contact_id  -- except self
      )
   UNION ALL
   SELECT home_phone
   FROM   contacts c
   WHERE  EXISTS (
      SELECT FROM mobile_phone x
      WHERE  c.home_phone = x.home_phone   -- cross-over covered by 1s SELECT
      AND    c.contact_id <> x.contact_id  -- except self
      )
   ) sub;

Repeating the same number in both columns of the same row does not qualify. I don't think you'd want to include those. (Would still be noise that might be worth disallowing with a CHECK constraint.)

edited Aug 06 '20 at 02:59

answered Aug 06 '20 at 02:45

Erwin Brandstetter

605,456
145
1,078
1,228

This works well. But how can I update the query so that it brings back the contact_ids of duplicate contacts? For example | where contact_1 and contact_2 share a phone number ? – GNG Aug 06 '20 at 22:36
If this is not possible, it would be almost as useful if we can bring back the email address that matched. – GNG Aug 06 '20 at 23:05
@GNG All possible, but more expensive. This question has asked for `duplicate contacts` or duplicative phone numbers (like your query suggests. Please ask your new questions as new question. (Please be explicit what you expect from the query.) You can always link to this one for context and drop a comment here to link back and get my attention. – Erwin Brandstetter Aug 06 '20 at 23:13
Okay. Thank you for the nudge. I posted the follow up here: https://stackoverflow.com/questions/63293191/search-for-cross-field-duplicates-in-postgresql-and-bring-back-matched-pairs – GNG Aug 06 '20 at 23:39

search for cross-field duplicates in postgresql

2 Answers2

Linked