-1

I have a scenario for a type2 table where I have to remove duplicates on total row level.

Lets consider below example as the data in table.

A|B|C|D|E  
100|12-01-2016|2|3|4  
100|13-01-2016|3|4|5  
100|14-01-2016|2|3|4  
100|15-01-2016|5|6|7  
100|16-01-2016|5|6|7  

If you consider A as key column, you know that last 2 rows are duplicates.

Generally to find duplicates, we use group by function.

select A,C,D,E,count(1) 
from table 
group by A,C,D,E 
having count(*)>1

for this output would be 100|2|3|4 as duplicate and also 100|5|6|7. However, only 100|5|6|7 is only duplicate as per type 2 and not 100|2|3|4 because this value has come back in 3rd run and not soon after 1st load. If I add date field into group by 100|5|6|7 will not be considered as duplicate, but in reality it is.

Trying to figure out duplicates as explained above. Duplicates should only be 100|5|6|7 and not 100|2|3|4. can someone please help out with SQL for the same.

Regards Raghav

Raghav
  • 1
  • 1
  • 2
  • what do you mean by type 2? – Vamsi Prabhala Mar 22 '16 at 15:43
  • you have only two duplicated in result : 100|12-01-2016|2|3|4 > 100|12-01-2016|2|3|4, and 100|15-01-2016|5|6|7 > 100|16-01-2016|5|6|7 – Adam Silenko Mar 22 '16 at 15:50
  • @vkp I Meant CDC Type 2, where in you maintain history and current data. – Raghav Mar 23 '16 at 08:33
  • I have found answer for the same. I have used lag function to store previous value and sorted it by date as it is Type 2 table. select A,B,lag(C||D||E) over (partition by A order by A,B) as Prev, A||B||C as CURR, case when PREV=CURR then 1 else 0 end as IND from table – Raghav Mar 23 '16 at 08:39

3 Answers3

1

Use row_number analytical function to get rid of duplicates.

delete from
(
select a,b,c,d,e,row_number() over (partition by a,b,c,d,e) as rownumb
from table
) as a
where rownumb > 1
NzGuy
  • 382
  • 3
  • 14
0

if you want to see all duplicated rows, you need join table with your group by query or filter table using group query as subquery.

Adam Silenko
  • 3,025
  • 1
  • 14
  • 30
0
wITH CTE AS (select a, B, C,D,E, count(*) 
from TABLE
group by 1,2,3,4,5
having count(*)>1)

sELECT * FROM cte
WHERE B <> B + 1

Try this query and see if it works. In case you are getting any errors then let me know.

I am assuming that your column B is in the Date format if not then cast it to date

If you can see the duplicate then just replace select * to delete

mehtat_90
  • 586
  • 9
  • 29