0

I am trying to delete duplicate records from netezza table. But few column contain null value so below code is not working.

DELETE FROM TABLE_NAME a
     WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
                               FROM TABLE_NAME b
                              WHERE a.COL1  = b.COL1
                                AND a.COL2  = b.COL2
                                AND a.COL3  = b.COL3);

Sample Data:-
COL1  COL2  COL3
X     NULL     Y
A     NULL     B
X     NULL     Y
X     NULL     Y
E     VAL      F

Expected result:
    COL1  COL2  COL3
    X     NULL     Y
    A     NULL     B
    E     VAL      F

Note: COL2 column contain null value. We have total 30 columns in this table and 6 columns contain null value for duplicate records.

Can anyone please help me on this issue.

Amlan
  • 15
  • 8

4 Answers4

0

You can use the NVL function to translate nulls to something you can compare. *Edit: you commented that NVL doesn't work. Alternatively, you can rewrite the query to explicitly handle NULL:

For instance:

DELETE FROM TABLE_NAME a
     WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
                               FROM TABLE_NAME b
                              WHERE((a.COL1 = b.COL1) or (a.COL1 is null and b.COL1 is null))
                                AND ((a.COL2 = b.COL2) or (a.COL2 is null and b.COL2 is null))
                                AND ((a.COL3 = b.COL3) or (a.COL3 is null and b.COL3 is null));
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • nvl is not working.. ERROR [HY000] ERROR: (2) This form of correlated query is not supported - consider rewriting – Amlan Jan 10 '18 at 10:02
  • Does the DELETE work without the changes handling the nulls? I don't know Netezza, but it seems quite restrictive with subqueries... – Neville Kuyt Jan 10 '18 at 11:01
  • Yes, delete query syntax is correct. but due to null value it cannot delete any records. So the output is '0 records affected'. – Amlan Jan 11 '18 at 01:59
0
DELETE FROM TABLE_NAME a
     WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
                               FROM TABLE_NAME b
                              WHERE nvl(a.COL1,0)  = nvl(b.COL1,0)
                                AND nvl(a.COL2,0)  = nvl(b.COL2,0)
                               and nvl(a.COL3,0)  = nvl(b.COL3,0));

Replace null value with 0 using NVL function

Abhi
  • 55
  • 9
  • nvl is not working.. ERROR [HY000] ERROR: (2) This form of correlated query is not supported - consider rewriting – Amlan Jan 10 '18 at 10:04
0

Try using the /=/ operator instead of = It usually works for me in these situations

Lars G Olsen
  • 1,093
  • 8
  • 11
0

For context, what are the distribution columns for the table, how many rows are in your table, and what percentage of those are you expecting to be duplicates? Depending on the scale a CTAS approach might be a better fit than a DELETE.

That being said, here's an approach that get's the delete logic right, but might not be the best performer.

TESTDB.ADMIN(ADMIN)=> select * from table_name;
 COL1 | COL2 | COL3 
------+------+------
 X    |      | Y
 X    |      | Y
 E    | VAL  | F
 A    |      | B
 X    |      | Y
(5 rows)

delete 
from 
        table_name
where   rowid in 
        ( select 
                rowid 
        from 
                ( select 
                        rowid, 
                        row_number() over (
                                   partition by col1, 
                                           col2     , 
                                           col3 
                                   order by 
                                           col1) rn
                from 
                        table_name
                ) foo
        where   rn > 1
        ) ;

DELETE 2
TESTDB.ADMIN(ADMIN)=> select * from table_name;
 COL1 | COL2 | COL3 
------+------+------
 A    |      | B
 X    |      | Y
 E    | VAL  | F
(3 rows)
ScottMcG
  • 3,867
  • 2
  • 12
  • 21