My answer assumes that the target is without dupes, and that we want to insert a new set - which happens to be a duplicate. I choose the group of 4 with the id_comb
of 1.
You would have to put the group of 4 into a staging table. Then, you have to pivot both staging and target horizontally - so that you get 5 columns named attr_pos1
to attr_pos5
(the biggest group in your example is 5). To pivot, you need a sequence number, which we get by using ROW_NUMBER(). That's for both tables, staging and target. Then, you pivot both. Then, you try to join pivoted staging and target on all 5 attr_pos#
columns, and count the rows. If you get 0, you have no duplicates. If you get 1, you have duplicates.
Here's the whole scenario:
WITH
-- input section: a) target table, no dupes
target(id_comb,attr_pos) AS (
SELECT 2,1
UNION ALL SELECT 2,2
UNION ALL SELECT 2,3
UNION ALL SELECT 2,4
UNION ALL SELECT 3,1
UNION ALL SELECT 3,2\
UNION ALL SELECT 3,3
UNION ALL SELECT 3,4
UNION ALL SELECT 3,5
UNION ALL SELECT 4,1
UNION ALL SELECT 4,2
UNION ALL SELECT 4,3
)
,
-- input section: b) staging, input, would be a dupe
staging(id_comb,attr_pos) AS (
SELECT 1,1
UNION ALL SELECT 1,2
UNION ALL SELECT 1,3
UNION ALL SELECT 1,4
)
,
-- query section:
-- add sequence numbers to stage and target
target_s AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY id_comb ORDER BY attr_pos) AS seq
, *
FROM target
)
,
staging_s AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY id_comb ORDER BY attr_pos) AS seq
, *
FROM staging
)
,
-- horizontally pivot target, NULLS as -1 for later join
target_h AS (
SELECT
id_comb
, IFNULL(MAX(CASE seq WHEN 1 THEN attr_pos END),-1) AS attr_pos1
, IFNULL(MAX(CASE seq WHEN 2 THEN attr_pos END),-1) AS attr_pos2
, IFNULL(MAX(CASE seq WHEN 3 THEN attr_pos END),-1) AS attr_pos3
, IFNULL(MAX(CASE seq WHEN 4 THEN attr_pos END),-1) AS attr_pos4
, IFNULL(MAX(CASE seq WHEN 5 THEN attr_pos END),-1) AS attr_pos5
FROM target_s
GROUP BY id_comb ORDER BY id_comb
)
,
-- horizontally pivot staging, NULLS as -1 for later join
staging_h AS (
SELECT
id_comb
, IFNULL(MAX(CASE seq WHEN 1 THEN attr_pos END),-1) AS attr_pos1
, IFNULL(MAX(CASE seq WHEN 2 THEN attr_pos END),-1) AS attr_pos2
, IFNULL(MAX(CASE seq WHEN 3 THEN attr_pos END),-1) AS attr_pos3
, IFNULL(MAX(CASE seq WHEN 4 THEN attr_pos END),-1) AS attr_pos4
, IFNULL(MAX(CASE seq WHEN 5 THEN attr_pos END),-1) AS attr_pos5
FROM staging_s
GROUP BY id_comb ORDER BY id_comb
)
SELECT
COUNT(*)
FROM target_h
JOIN staging_h USING (
attr_pos1
, attr_pos2
, attr_pos3
, attr_pos4
, attr_pos5
);
Hope this helps ----
Marco