How to remove duplicate rows in SQL Server self join

Question

I am trying to modify one column value on the basis of another row in same table, so I am using self join, I am getting the expect changes, but its selecting both rows (original and modified one), I don't want to select original row, if any of its column is being modified, it should only select modified row, if its values are modified, otherwise it should select original row. I created a similar schema to elaborate the issue:

    create table tbl
    (
        id int not null,    
        status int not null,
        name varchar null,
        subject varchar null,
        result varchar null
    );


    /* Create few records in this table */
INSERT INTO tbl VALUES(1, 1, 'A', 'sub1', 'Pending');
INSERT INTO tbl VALUES(2, 2, 'A', 'all', 'Passed');
INSERT INTO tbl VALUES(3, 1, 'B', 'sub1', 'Pending');
INSERT INTO tbl VALUES(4, 3, 'B', 'sub2', 'Failed');
INSERT INTO tbl VALUES(5, 3, 'C', 'sub1', 'Failed');
INSERT INTO tbl VALUES(6, 2, 'D', 'sub1', 'Passed');
INSERT INTO tbl VALUES(7, 1, 'E', 'sub1', 'Pending');
COMMIT;

    SELECT distinct t1.id, t1.status, t1.name, t1.subject,
    CASE 
        WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
        WHEN t1.status = 1 THEN 'Pending'
        WHEN t1.status = 2 THEN 'Passed'
        WHEN t1.status = 3 THEN 'Failed'
    END AS 'result'
    FROM tbl t1 join tbl t2 on t1.name = t2.name

----- Result ----------------------------

1|1|A|sub1|Pending
1|1|A|sub1|Passed
2|2|A|all|Passed
3|1|B|sub1|Pending
4|3|B|sub2|Failed
5|3|C|sub1|Failed
6|2|D|sub1|Passed
7|1|E|sub1|Pending

Here, the row with ID: 1 is getting duplicated, I don't want the first row anymore since I modified its Result column value to Passed on basis of second row in the original table subject: all (ID: 2).

------ Expected Result-----------------------

1|1|A|sub1|Passed
2|2|A|all|Passed
3|1|B|sub1|Pending
4|3|B|sub2|Failed
5|3|C|sub1|Failed
6|2|D|sub1|Passed
7|1|E|sub1|Pending

Can you also include the actual output you want to see here, based on the four record sample table you gave above? — Tim Biegeleisen, Nov 14 '19 at 15:50
I dont want to put another field, I know here must be some way to solve it. — Ashok Damani, Nov 14 '19 at 16:02

Chris Albert · Answer 1 · 2019-11-14T18:22:55.470

1

The issue with your sample is your join produces one row where subject = sub1 on both sides so your CASE statement does not catch it. Easiest way to modify your query for the desired result is to eliminate that scenario by expanding your join conditions to exclude like statuses.

SELECT distinct t1.id, t1.status, t1.name, t1.subject,
CASE 
    WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
    WHEN t1.status = 1 THEN 'Pending'
    WHEN t1.status = 2 THEN 'Passed'
    WHEN t1.status = 3 THEN 'Failed'
END AS 'result'
FROM tbl t1 left join tbl t2 on t1.name = t2.name AND t1.status <> t2.status

edited Nov 14 '19 at 18:22

answered Nov 14 '19 at 16:35

Chris Albert

2,462
8
27
31

This shows expected result for above example. But for some reason it doesn't work in my real scenario. I am trying to find what's the different. Thank you for your answer. – Ashok Damani Nov 14 '19 at 16:51
Sounds good. Please expand you example when you find the anomaly. – Chris Albert Nov 14 '19 at 16:52
Hey, this is not correct probably, please check my updated question, when there are more records with same status, it just eliminates those, which is not expected. – Ashok Damani Nov 14 '19 at 16:58
Updated my answer. Changed join to left join so it retains the records without multiples. It outputs your expected result. – Chris Albert Nov 14 '19 at 18:23
yea, that works, but I think more readable is the one, that I have posted as answer, check that out. Anyways, thanks for your attention. – Ashok Damani Nov 15 '19 at 10:48

score 0 · Answer 2 · answered Nov 14 '19 at 16:37

0

In fact expected result will look like this based on case statement

1|1|A|sub1|Pending
2|2|A|all|Passed
3|1|B|sub1|Pending
4|3|B|sub2|Failed

the duplication happened in your first query was due to query match from two table based on name only where it suppose to match by both name and id so to remove duplication and get expected result use the following query

FROM tbl t1 join tbl t2 on t1.name = t2.name  and t1.id = t2.id

answered Nov 14 '19 at 16:37

Saif

2,611
3
17
37

1

fyi: this is not expected result, please check once. – Ashok Damani Nov 14 '19 at 16:38

score 0 · Answer 3 · answered Nov 14 '19 at 16:40

You need to just add one more where condition t1.subject <> t2.subject Try following.

SELECT distinct t1.id, t1.status, t1.name, t1.subject,
CASE 
    WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
    WHEN t1.status = 1 THEN 'Pending'
    WHEN t1.status = 2 THEN 'Passed'
    WHEN t1.status = 3 THEN 'Failed'
END AS 'result'
FROM tbl t1 join tbl t2 on t1.name = t2.name and t1.subject <> t2.subject

score 0 · Answer 4 · answered Nov 14 '19 at 17:19

0

Use left join:

SELECT t1.id, t1.status, t1.name, t1.subject, ISNULL(t2.result, t1.result) result
FROM tbl t1 
left join tbl t2 on t1.name = t2.name and t2.subject = 'all' and t2.status = 2

answered Nov 14 '19 at 17:19

Valeriy Tesher

1

score 0 · Answer 5 · answered Nov 14 '19 at 17:22

Looking at the query what is found is, The row is not basically duplicating but appearing in your result because of the Join and CASE statement.

In your query, following part is problematic.

    WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
    WHEN t1.status = 1 THEN 'Pending'

your first case statement is giving this result.

   1    1   "A" "sub1"  "Passed"

and Second case statement is giving following result. Because you wanted result as 'Pending' for status =1. But due to join the Id=1 row is also appearing.

   1    1   "A" "sub1"  "Pending"
   3    1   "B" "sub1"  "Pending"

You wanted to exclude the particular row. which can be eliminated by using EXCEPT operator.

I have modified your query using EXCEPT where I have eliminated unwanted row.

SELECT distinct t1.id, t1.status, t1.name, t1.subject,
CASE 
    WHEN (t1.status = 1 and t2.subject = 'all' and t2.status = 2 ) THEN 'Passed' 
    WHEN (t1.status = 1 and t1.id <> 1) THEN 'Pending'
    WHEN t1.status = 2 THEN 'Passed'
    WHEN t1.status = 3 THEN 'Failed'
END as 'result'
FROM tbl t1 join tbl t2 on t1.name = t2.name 
EXCEPT 
SELECT distinct t1.id, t1.status, t1.name, t1.subject,null as 'result'
FROM tbl t1  join tbl t2 on t1.name = t2.name 
where t1.status=1 and t1.name='A' and t1.subject='sub1' and t2.status=1

Now you can find below result as per you wanted.

3   1   "B" "sub1"  "Pending"
1   1   "A" "sub1"  "Passed"
4   3   "B" "sub2"  "Failed"
2   2   "A" "all"   "Passed"

Two things: First, this is not the expected answer, please check once in question. Second, You are using hardcoded values in where condition, which is not suitable for my scenario, it should be generic. — Ashok Damani, Nov 14 '19 at 17:25
@AshokDamani, I have used only some inserts rather than inserting all the values. Based on few insert i have eliminated the unwanted row. This answer is just for hinting the expected answers. Rest you would need to do your research on it. — Raj Paliwal, Nov 14 '19 at 17:37

score 0 · Answer 6 · answered Nov 15 '19 at 10:46

Finally resolved it!

with T as (
SELECT distinct t1.id, t1.status, t1.name, t1.subject,
    CASE 
        WHEN t1.status = 1 and t2.subject = 'all' and t2.status = 2 THEN 'Passed'
        WHEN t1.status = 1 THEN 'Pending'
        WHEN t1.status = 2 THEN 'Passed'
        WHEN t1.status = 3 THEN 'Failed'
    END AS 'result'
    FROM tbl t1 join tbl t2 on t1.name = t2.name
)
select * from T 
group by T.name, T.subject
order by T.id

score -1 · Answer 7 · answered Nov 14 '19 at 16:29

I would look to break this up a bit into different steps. I converted this to temp tables so i could run it easier in my environment.

seems to be yielding the results your looking for.

create table #tbl
(
    id int not null,    
    status int not null,
    name varchar(20) null,
    subject varchar(20) null,
    result varchar(20) null
);


/* Create few records in this table */
INSERT INTO #tbl VALUES(1, 1, 'A', 'sub1', 'Pending');
INSERT INTO #tbl VALUES(2, 2, 'A', 'all', 'Passed');
INSERT INTO #tbl VALUES(3, 1, 'B', 'sub1', 'Pending');
INSERT INTO #tbl VALUES(4, 3, 'B', 'sub2', 'Failed');
--COMMIT;

select distinct id, status, name, subject
into #finaltbl
from #tbl

alter table #finaltbl
add result varchar(50)

update #finaltbl 
set result = case when (select count(1) from #tbl t 
                            where f.name = t.name 
                            and (f.subject = t.subject or t.subject = 'all')
                            and t.result = 'passed') > 0 then 'Passed' 
             when (select count(1) from #tbl t 
                            where f.name = t.name 
                            and (f.subject = t.subject or t.subject = 'all')
                            and t.result = 'failed') > 0 then 'Failed' 
             when (select count(1) from #tbl t 
                            where f.name = t.name 
                            and (f.subject = t.subject or t.subject = 'all')
                            and t.result = 'pending') > 0 then 'pending' end
from #finaltbl f

select * from #finaltbl

drop table #tbl

drop table #finaltbl

How to remove duplicate rows in SQL Server self join

7 Answers7