To Remove Duplicates from Netezza Table

Question

I have a scenario for a type2 table where I have to remove duplicates on total row level.

Lets consider below example as the data in table.

A|B|C|D|E  
100|12-01-2016|2|3|4  
100|13-01-2016|3|4|5  
100|14-01-2016|2|3|4  
100|15-01-2016|5|6|7  
100|16-01-2016|5|6|7

If you consider A as key column, you know that last 2 rows are duplicates.

Generally to find duplicates, we use group by function.

select A,C,D,E,count(1) 
from table 
group by A,C,D,E 
having count(*)>1

for this output would be 100|2|3|4 as duplicate and also 100|5|6|7. However, only 100|5|6|7 is only duplicate as per type 2 and not 100|2|3|4 because this value has come back in 3rd run and not soon after 1st load. If I add date field into group by 100|5|6|7 will not be considered as duplicate, but in reality it is.

Trying to figure out duplicates as explained above. Duplicates should only be 100|5|6|7 and not 100|2|3|4. can someone please help out with SQL for the same.

Regards Raghav

you have only two duplicated in result : 100|12-01-2016|2|3|4 > 100|12-01-2016|2|3|4, and 100|15-01-2016|5|6|7 > 100|16-01-2016|5|6|7 — Adam Silenko, Mar 22 '16 at 15:50
@vkp I Meant CDC Type 2, where in you maintain history and current data. — Raghav, Mar 23 '16 at 08:33
I have found answer for the same. I have used lag function to store previous value and sorted it by date as it is Type 2 table. select A,B,lag(C||D||E) over (partition by A order by A,B) as Prev, A||B||C as CURR, case when PREV=CURR then 1 else 0 end as IND from table — Raghav, Mar 23 '16 at 08:39

score 1 · Answer 1 · edited Oct 11 '16 at 16:00

1

Use row_number analytical function to get rid of duplicates.

delete from
(
select a,b,c,d,e,row_number() over (partition by a,b,c,d,e) as rownumb
from table
) as a
where rownumb > 1

edited Oct 11 '16 at 16:00

NzGuy

382
3
14

answered Sep 04 '16 at 12:55

Ajith Acharya

11
1

score 0 · Answer 2 · answered Mar 22 '16 at 15:51

0

if you want to see all duplicated rows, you need join table with your group by query or filter table using group query as subquery.

answered Mar 22 '16 at 15:51

Adam Silenko

3,025
1
14
30

mehtat_90 · Answer 3 · 2016-03-23T15:34:11.087

0

wITH CTE AS (select a, B, C,D,E, count(*) 
from TABLE
group by 1,2,3,4,5
having count(*)>1)

sELECT * FROM cte
WHERE B <> B + 1

Try this query and see if it works. In case you are getting any errors then let me know.

I am assuming that your column B is in the Date format if not then cast it to date

If you can see the duplicate then just replace select * to delete

edited Mar 23 '16 at 15:34

answered Mar 23 '16 at 15:13

mehtat_90

586
9
29

To Remove Duplicates from Netezza Table

3 Answers3