Associate joined table rows with same parent table rows if a value of field in parent row is same

Question

I have data in two tables. City and Address.

City

id - name - code
1  - abc  - 1
2  - xyz  - 2
3  - efg  - 2

Address

id - city_id - name
1  -   1     - wer
2  -   2     - sdf
3  -   3     - tyu

Now code for cities with id 2 and 3 are the same. I want to delete the city with id 3 and change city_id 3 to 2 in the Address table so that the last two rows of Address table are associated with city id 2

In other words, I want to delete duplicate rows in the parent table and update the child table accordingly. There are thousands of such rows in my 2 tables.

Can you elaborate more, like what should be condition to delete duplicate filed in city table. Same code in city table that means duplicate entry then how are you going to determine which entry to keep and which to delete? — Jay Godhani, Apr 23 '22 at 06:46
Will keep the first instance of city, and delete the rest. Let's the rows are ordered by id in city — Sikander, Apr 23 '22 at 07:02

Jay Godhani · Answer 1 · 2022-04-23T08:39:37.867

If you are going to do this multiple times then I would suggest you to create a Stored Procedure and call it whenever you need it.

CREATE OR REPLACE PROCEDURE delete_dulicate_city()
LANGUAGE 'plpgsql'
AS $$
BEGIN
    DROP TABLE IF EXISTS temp_city;

    CREATE TEMPORARY TABLE IF NOT EXISTS temp_city
    (
        id_to                      INT,
        id_from                    INT
    );
        
    INSERT INTO temp_city
    SELECT c1.id id_to, c2.id id_from 
    FROM
    (
        SELECT id, code, row_number() over(partition BY code ORDER BY id) rn
        FROM city
    ) c1 
    INNER JOIN city c2 
    ON  c1.rn = 1
    AND c1.code=c2.code 
    AND c2.id > c1.id;

    UPDATE address a
    SET city_id = tc.id_to
    FROM temp_city tc
    WHERE a.city_id=tc.id_from;

    DELETE FROM city 
    WHERE id IN (SELECT id_from FROM temp_city);
END;
$$;

You can call it whenever you need just by executing CALL delete_duplicate_city();

https://onecompiler.com/postgresql/3xzy48uq4

Serg · Answer 2 · 2022-04-23T07:21:30.617

Create a table of pairs to be processed (cc), then update Address and delete from City accordingly

   select c1.id idto, c2.id idfrom into cc 
   from (
     select id,  code,  row_number() over(partition by code order by id) rn
     from city 
   ) c1 
   join city c2 on c1.rn = 1 and c1.code = c2.code and c2.id > c1.id;

update Address a
set city_id = cc.idto
from  cc
where a.city_id=cc.idfrom;

delete from city 
where id in (select idfrom from cc);

db<>fidle

Associate joined table rows with same parent table rows if a value of field in parent row is same

2 Answers2