0

i am new to SQL and Data analysis. I have a scenario i am trying to identify using SQL partitions. Basically i want to find duplicates [same first_name, last_name, suffix code and Zip code but only if the id's are different.

This query gives me only partial results which is not correct...i know i am missing a filter here and there.

SELECT i.party_id,
    I.FIRST_NM,
    I.LAST_NM,
    I.SFFX_CD,
    A.ZIP_CD,
    ROW_NUMBER() OVER (PARTITION BY I.FIRST_NM,
    I.LAST_NM,
    I.SFFX_CD,
    A.ZIP_CD
    ORDER BY I.PARTY_ID) AS RN 
    FROM INDVDL I,
    PARTY_ADDR A
  WHERE I.PARTY_ID = A.PARTY_ID

i should only get the ones marked with ** and not the rest

PARTY_ID FIRST_NM LAST_NM SFFX_CD ZIP_CD RN

886874  John    Doe Jr. 45402   1
886874  John    Doe Jr. 45406   1
934635  John    Doe Jr. 45406   2
886874  John    Doe Jr. 45415   1
886874  John    Doe Jr. 45415   2
886874  John    Doe Jr. 45415   3
886874  John    Doe Jr. 45415   4
886874  John    Doe Jr. 45415   5
886874  John    Doe Jr. 45415   6
**886874    John    Doe Jr. 45415   7
**934635    John    Doe Jr. 45415   8
934635  John    Doe Jr. 45415   9
934635  John    Doe Jr. 45415   10
user3002581
  • 3
  • 1
  • 4

1 Answers1

0

Here is my suggestion. Use window functions to get the minimum and maximum values of PARTY_ID for the groups you have in mind. Then, filter to return only rows where these are different:

SELECT *
FROM (SELECT i.*, a.*,
             MIN(I.PARTY_ID) OVER (PARTITION BY I.FIRST_NM, I.LAST_NM, I.SFFX_CD, A.ZIP_CD) as min_pi,
             MAX(I.PARTY_ID) OVER (PARTITION BY I.FIRST_NM, I.LAST_NM, I.SFFX_CD, A.ZIP_CD) as max_pi
      FROM INDVDL I JOIN
           PARTY_ADDR A
           ON I.PARTY_ID = A.PARTY_ID
     ) ia
WHERE min_pi <> max_pi;

Note: I fixed your join syntax to use explicit joins. Simple rule: never use commas in the from clause.

Also, I replaced the column lists with * for convenience. Add in the columns you want.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for the reply. There could however be a scenario where we have more than 2 party_id's within the same partition.Would this query account for that? – user3002581 Mar 21 '15 at 16:46
  • @user3002581 . . . Of course. If there is more than 1, then the minimum and maximum will be different. If the value can be `NULL`, the logic is a bit more complicated, but you can still do something similar. – Gordon Linoff Mar 21 '15 at 17:19