SQL select join in same table by comparing 2 fields

Question

I have the following table with these records as example:

+------+---------+---------+
| key  | amount1 | amount2 |
|------|---------|---------|
| A    | 100     | 0       |
| B    | 0       | 100     |
| C    | 100     | 0       |
| D    | 66      | 34      |
| E    | 99      | 12      |
| F    | 100     | 12      |
| G    | 12      | 99      |
+------+---------+---------+

I'd like to delete all the records where fields amount1 of one row = amount2 of another row, but only by pairs : If I find 2 records with amount1 = amount2 then I can delete them, if I find a third record I have to keep it.

Example : The 1st record above with key = A has amount1 = 100 and amount2 = 0, the pair here will be key B where amount1 = 0 and amount2 = 100. row with key = C must be kept.

+------+---------+---------+
| key  | amount1 | amount2 |
+------+---------+---------+
| A    | 100     | 0       | 
| B    | 0       | 100     |  Pair found with key = A : Delete key = A and key = B
| C    | 100     | 0       |  No pair found as the 2 first records compose a pair
| D    | 66      | 34      |  No pair found
| E    | 99      | 12      |  No pair found
| F    | 100     | 12      |  No pair found
| G    | 12      | 99      |  Pair found with key = E : Delete records with key = E and key = G
+------+---------+---------+

The expected result is the below one :

+------+---------+---------+
| key  | amount1 | amount2 |
+------+---------+---------+
| C    | 100     | 0       |
| D    | 66      | 34      |
| F    | 100     | 12      |
+------+---------+---------+

So either I would like to determine all the pairs then delete them, or display directly only the rows with no pair.

Any clue ?

Thanks in advance for your help,

Do you just need amount1 in row A to equal amount2 in row B, or do you ALSO need amount2 in row A to equal amount1 in row B, at the same time, before you can delete rows A and B? For example, let's assume row E didn't exist. Could you simply delete rows F and G, because row G amount1 equals row R amount2 (they are both 12), even though the "other value" is 100 in one row and 99 in the other row? — , Jul 21 '16 at 11:53
also, strictly speaking, by your description if a row has amount1 = amount2 then the row could cancel itself. Is that the desired behavior, or should the row always cancel a **different** row? — , Jul 21 '16 at 11:57
@mathguy A and C are not a pair. A and B, and B and C are pairs — Strawberry, Jul 21 '16 at 12:01
Hi @mathguy, Yes it should be rowA-amount1 = rowB-amount2 AND rowA-amount2 = rowB-amount1. For your second question, by considering row E doesn't exist, yes, as rowF-amount1 != rowG-amount2 even if rowF-amount2 = rowG-amount1, I have to keep both of them — Walid, Jul 21 '16 at 12:03
@Strawberry you're right, for rows A, B and C, we have 2 pairs, but I have to delete (or select) only the first found pair (A and B). If I had an extra row key = Z with amount1 = 0 and amount2 = 100, then I can consider C is paired with Z — Walid, Jul 21 '16 at 12:07
@Jens, I am an SAP developer, so I'll code in ABAP for this requirement, which means it should be in native SQL if possible, without use of stored procedure — Walid, Jul 21 '16 at 12:10
I'd like to point out that there's no inherent concept of "order" in relational databases so the choice to keep C and not A looks very arbitrary. Is there any specific logic behind it? — apokryfos, Jul 21 '16 at 12:11
So, if a row has an even number of of pairs, keep its maximum pair. If a row has an odd number of pairs, delete them all. If a row has no pairs, keep it. — Strawberry, Jul 21 '16 at 12:15
Your question is either for a single particular DBMS, or it's about portable SQL and therefore not specific to any. Please fix your tags accordingly. — underscore_d, Jul 21 '16 at 12:22

score 0 · Answer 1 · answered Jul 21 '16 at 12:20

One possible way to do it, and not particularly nice.

DELETE a
FROM some_table a
INNER JOIN
(
    SELECT a.amount1  AS am1, 
            a.amount2  AS am2,
            b.amount1  AS bm1,
            b.amount2  AS bm2,
            MIN(a.`key`) AS del_key
    FROM some_table a
    INNER JOIN some_table b
    ON a.amount1 = b.amount2
    AND a.amount2 = b.amount1
    GROUP BY a.amount1, 
            a.amount2,
            b.amount1,
            b.amount2
    UNION
    SELECT a.amount1  AS am1, 
            a.amount2  AS am2,
            b.amount1  AS bm1,
            b.amount2  AS bm2,
            MIN(b.`key`) del_key
    FROM some_table a
    INNER JOIN some_table b
    ON a.amount1 = b.amount2
    AND a.amount2 = b.amount1
    GROUP BY a.amount1, 
            a.amount2,
            b.amount1,
            b.amount2
) b
ON a.key = b.del_key

This gets the matching amounts and the min value of the keys for each one. Then uses this as a sub query to join against the original table as part of DELETE statement.

One situation that this query does not seem to cover is this: You have two rows with amounts (12, 99) and five rows with amounts (99, 12). The result is that the two rows with amounts (12, 99) and exactly two of the five rows with amounts (99, 12) should be deleted. I believe this solution will only delete one of each, still leaving another pair that could (should?) be deleted. — , Jul 21 '16 at 12:27
@mathguy - you are quite right, it deletes the first duplicates. Given the demo data not 100% certain what the OP wants. Plus point is it is fairly standard SQL — Kickstart, Jul 21 '16 at 12:40

score 0 · Answer 2 · 2016-07-21T12:32:51.267

Something like this. Note the extra effort needed to remove pairs of rows that have amount1 = amount2. I added a few extra rows in my test data to test proper working of the query.

I illustrate (using the MINUS operator) the way to show just the "remaining" rows in a query, after "deleting" (MINUS) the paired rows. Something similar can be used for a DELETE statement - although perhaps a better practice is to have an additional column flag to just flag the paired rows, instead of deleting them (in real life you don't want to simply delete historical information).

ADDED: This solution is for Oracle; I don't know what changes (if any) may be needed in other DB products. Also: note that in this solution, if you have two rows with amounts (12,99) and five with (99,12), what is left after "deletions" is three rows with (99,12). It seems to me that this is the logical interpretation of the requirements in this case.

with
     input_data ( key, amount1, amount2 ) as (
       select     'A',     100,       0   from dual union all
       select     'B',       0,     100   from dual union all
       select     'C',     100,       0   from dual union all
       select     'D',      66,      34   from dual union all
       select     'E',      99,      12   from dual union all
       select     'F',     100,      12   from dual union all
       select     'G',      12,      99   from dual union all
       select     'H',     200,     200   from dual union all
       select     'I',     200,     200   from dual union all
       select     'J',     200,     200   from dual union all
       select     'K',     300,     300   from dual
     ),
     prep ( key, amount1, amount2, rn ) as (
       select key, amount1, amount2, 
              row_number() over (partition by amount1, amount2 order by key)
       from   input_data
     )
select   key,   amount1,   amount2
from   input_data
minus
select a.key, a.amount1, a.amount2
from   prep a inner join prep b
              on
                 (
                       a.amount1    = b.amount2
                   and a.amount2    = b.amount1
                   and a.amount1   != a.amount2
                   and a.rn         = b.rn
                 )
                 or
                 (
                       a.amount1    = b.amount2
                   and a.amount2    = b.amount1
                   and a.amount1    = b.amount1
                   and 
                       (
                             mod(a.rn, 2) = 1
                         and b.rn         = a.rn + 1 
                       )
                       or
                       (
                             mod(a.rn, 2) = 0
                         and b.rn         = a.rn - 1                       
                       )
                 )
;

KEY    AMOUNT1    AMOUNT2
--- ---------- ----------
D           66         34
F          100         12
J          200        200
K          300        300

SQL select join in same table by comparing 2 fields

2 Answers2