Get all rows where two ids appear the first time

Question

i need help with a problem that drives me nuts.
Imagine you have the following table with just three ids.

CREATE TABLE candidates
(
    row_num BIGINT,
    in_id  BIGINT,
    out_id BIGINT
);

Fill it with some data:

INSERT INTO candidates (row_num, in_id, out_id)
VALUES (1, 220, 210),
       (2, 208, 210),
       (3, 216, 210),
       (4, 220, 214),
       (5, 208, 214),
       (6, 216, 214),
       (7, 212, 218);

What i now want is to select all rows in a way that each in_id and out_id is unique across all selected rows.
So we'd select the first row, because we didn't see either the in_id 220 nor the out_id 210 before. The pair (220,210) is selected and therefore no following row is allowed to have the in_id 220 or the out_id 210. That's why we need to skip the next 3 rows (they either have in_id 220 or out_id 210). Now we end up at row_num 5. Neither 208 nor 214 where selected before. Thats why it also should be part of the final result set. Same goes for row_num 7.

The final result set should look like:

row_num | in_id | out_id
   1       220      210
   5       208      214
   7       212      218

Seems so simple but i can't solve it. Cleanest try is the following query which has the problem that despite that it knows which ids appeared before it can't detect if a row was selected or not. Breaks for the case of row_num 5.

SELECT *
FROM candidates tc1
WHERE NOT EXISTS (
  SELECT
  FROM candidates tc2
  WHERE (tc2.in_id = tc1.in_id OR tc2.out_id = tc1.out_id)
    AND tc2.row_num < tc1.row_num
);

Any help greatly appreciated, thanks!

The problem is not deterministic since there are many solution to it. For example it could start with (216, 210). — The Impaler, Apr 01 '23 at 13:26
It can be done using arrays or strings, but it will somehow limit the number of rows: 1) How many rows does the candidate table have? 2) How many rows is the expected result? — The Impaler, Apr 01 '23 at 13:27
@TheImpaler well the order is determined by row_num. Top to bottom. Does this make sense? — Tom, Apr 01 '23 at 16:29

score 1 · Accepted Answer · answered Apr 01 '23 at 13:27

1

This can be done using WITH recursive :

with recursive cte as (
  select row_num, in_id, out_id, 1 as exist
  from candidates
  where row_num = 1
  union all
  select t.row_num, t.in_id, t.out_id, exist+1
  from cte c
  inner join candidates t on t.row_num > c.row_num
  and (t.in_id <> c.in_id and t.out_id <> c.out_id)
)
select t.row_num, t.in_id, t.out_id
from candidates t
inner join (
  select min(row_num) as row_num, exist
  from cte
  group by exist
) as s on s.row_num = t.row_num

The recursive CTE used to get records recursively by joining the previous records with the current table where current row bigger than already found records and current in_id and out_id not exist in previous records.

Demo here

answered Apr 01 '23 at 13:27

SelVazi

10,028
2
13
29

1

That's it!! Thank you so much! Very nice approach! – Tom Apr 01 '23 at 15:52
1

Ok, but this example doesn't seem to work. Do you know what's wrong? https://dbfiddle.uk/a7YY86XK – Tom Apr 01 '23 at 19:16
The order should be respected by row_id ? – SelVazi Apr 01 '23 at 21:51
1

Exactly. row_num defines the order and we need to process from top to bottom – Tom Apr 01 '23 at 21:53
Its not easy to make it using a simple query if data not sorted, this need a procedure – SelVazi Apr 02 '23 at 15:11

Get all rows where two ids appear the first time

1 Answers1