I have three tables:
x (column1,..., x_id)
y (column1, column2, colum3, ..., x_id, y_id)
z (column1, column2, colum3,..., x_id, y_id, z_id)
My final goal: Delete duplicates, check the y table for duplicates, and if found check a second table for those duplicates, to see if the resultset is also the same, and if it is then delete one of them. I can join the tables but both of them are really big, so that is my main concern.
For example, This is just an example.
with cte as (
select column1, column2, y_id, count(*) as cou from y
group by column1, column2, x_id,
having count(*) = 2
)
select distinct tabel1.* from y tabel1
join cte tabel2 on tabel1.x_id= tabel2.x_id and tabel1.column1 = tabel2.column1 and tabel1.column2 = tabel2.column2
Then I want to check these found duplicates and if their items in another table are exactly the same. if it is then delete one of them.
For more clarification consider this example: y table: my table structure is very similar to this and this table is close to the reality of what I have.
column 1 | column 2 | x_id | y_id |
---|---|---|---|
foo | bar | 1 | 1 |
foo | bar | 1 | 2 |
xx | yy | 2 | 3 |
zz | kk | 2 | 4 |
tt | mm | 2 | 5 |
baz | qux | 3 | 6 |
baz | qux | 3 | 7 |
has two groups with the same column 1 and column 2 and x_id values (the y_ids = (1 and 2) and y_id == (6,7).
Then for these rows, I want to check a second table to see if all the items are found in each other. meaning they have the same items in the z table. The items of the y_id = 1 and y_id = 2 are the same, so one of them in y table should be deleted.
{color:red;} column 1 | column 2 | column | x_id | y_id | z_id |
---|---|---|---|---|---|
foo | bar | qux | 1 | 1 | 1 |
foo2 | bar2 | qux2 | 1 | 1 | 2 |
foo | bar | qux | 1 | 2 | 3 |
foo2 | bar2 | qux2 | 1 | 2 | 4 |
baz | foo | qux | 3 | 6 | 5 |
baz2 | foo2 | qux1 | 3 | 6 | 6 |
baz3 | foo3 | qux2 | 3 | 6 | 7 |
baz | foo | qux | 3 | 7 | 8 |
baz2 | foo2 | qux1 | 3 | 7 | 9 |
I want to delete the row with y_id = 1 from the y table ( or y_id = 2, which duplicate row doesn't make a difference). The tables are very big so a fast solution would be great. I know I can join the tables, but I was hoping I can find another solution. A solution that can use the group by ids instead and just look for the found items there inside the z table (third table)