Duplicates - Deletion under PostgreSQL

Question

I am working on tables where primary keys are sometimes not defined.

I have observed the following behavior while trying to delete duplicates using the row deletion button of the interface: Deletion of one row (1 of the duplicate) leads to deletion of all similar rows (loss of the data) under Navicat.

Example: In table table_test containing 2 duplicates Deletion of row 1 leads to deletion of rows 1 and 2.

table_test

a	b	c
A	A	A
A	A	A
B	B	B
B	B	B

Q1: Could you explain me the behavior of the database when there is no primary key? Why are those rows treated similarly internally? (I know that there is a link with the fact that there is no unique identifier but more precisely.)

Q2: Under DBeaver, it is possible to delete the duplicates in defining a virtual primary key on all columns. How does the virtual primary key work? It is not possible to define a primary key on columns (a, b, c) due to the presence of duplicates. How is the 'virtual' primary key created?

I _guess_ DBEaver uses the internal (Postgres specific) `ctid` column for the WHERE clause. — , Nov 28 '22 at 09:08
Please ask 1 (specific researched non-duplicate) question. PS Your questions are not clear. (Including: Please avoid vague terms like "similarly" & "in reality". Putting scare quotes around terms doesn't explain the idiosyncratic meaning you don't give. What exactly do you mean by how it "works"?) PS [mre] PS [Why should I not upload images of code/data/errors when asking a question?](https://meta.stackoverflow.com/q/285551/3404097) — philipxy, Nov 28 '22 at 09:27
For Q1, I discover that the SQL script behind the row deletion is written this way: DELETE FROM public.table_test WHERE a='A' AND b='A' AND c='A'; So all similar rows disappear. I am still interested by the way DBeaver is managing the virtual PK (found little info in DBeaver documentation) — Sylvainjoon, Nov 28 '22 at 10:12

score 0 · Accepted Answer · answered Nov 28 '22 at 20:33

I would question Q2; DBeaver does not do that on its own. The following gives does what you are after (run in DBeaver - but does not depend on it). (see demo). It generates a virtual id for each column combination using the Window Function row_number(), and selects the ctid for each of those rows. Note the ctid is guaranteed to be unique. (See demo)

delete 
  from test_table 
 where ctid in (select ctid 
                  from (select ctid
                             , row_number() over (partition by a, b, c) rn 
                          from test_table
                    ) gctid
                where rn > 1
               );

Duplicates - Deletion under PostgreSQL

1 Answers1