Delete from table A joining on table A in Redshift

Question

I am trying to write the following MySQL query in PostgreSQL 8.0 (specifically, using Redshift):

DELETE t1 FROM table t1
LEFT JOIN table t2 ON (
    t1.field   = t2.field AND
    t1.field2  = t2.field2
)
WHERE t1.field > 0

PostgreSQL 8.0 does not support DELETE FROM table USING. The examples in the docs say that you can reference columns in other tables in the where clause, but that doesn't work here as I'm joining on the same table I'm deleting from. The other example is a subselect query, but the primary key of the table I'm working with has four columns so I can't see a way to make that work either.

What error/results are you seeing. First thing I see that seems wrong with your query is that it should be "DELETE FROM" (removing "t1") — David S, Apr 30 '14 at 16:29
Yeah sure, that's a mysql query I don't expect it to just work in postgresql. I just haven't been able to find an equivalent in Postgresql 8.0 at all. The syntax for deletes is very minimal. — moinudin, Apr 30 '14 at 16:59
Are you sure the MySQL query is doing what you want? The `LEFT JOIN` seems completely useless. — Erwin Brandstetter, Apr 30 '14 at 17:13
@ErwinBrandstetter Yeah, you're right, the left join is useless. Join would be the correct behaviour. I'm migrating code that was written ages ago. — moinudin, Apr 30 '14 at 20:07
You should describe the exact purpose of the query. The broken MySQL query doesn't cut it. — Erwin Brandstetter, Apr 30 '14 at 20:27
@ErwinBrandstetter Sure. The purpose is to delete records from a table , with a where clause that refers to other rows in the same table. The full, unsimplified use case is that I want to remove records where there is another record in the table with 3 fields matching, and 1 field falls within a range of 60. — moinudin, Apr 30 '14 at 22:08
The description is an essential part of the question and should not be hidden in the comments, edit the question and make the example match the text. This also *needs* the table definition. To identify "another record" we need the primary key. We also need to see `NOT NULL` constraints. With 37k XP you should be well aware of the basics here .. — Erwin Brandstetter, Apr 30 '14 at 22:14

Erwin Brandstetter · Answer 1 · 2022-12-19T05:47:43.240

16

Amazon Redshift was forked from Postgres 8.0, but is a very much different beast. The manual informs, that the USING clause is supported in DELETE statements:

Just use the modern form:

DELETE FROM tbl
USING  tbl t2
WHERE  t2.field  = tbl.field
AND    t2.field2 = tbl.field2
AND    t2.pkey  <> tbl.pkey    -- exclude self-join
AND    tbl.field > 0;

This is assuming JOIN instead of LEFT JOIN in your MySQL statement, which would not make any sense. I also added the condition AND t2.pkey <> t1.pkey, to make it a useful query. This excludes rows joining itself. pkey being the primary key column.

What this query does:
Delete all rows where at least one other row exists in the same table with the same not-null values in field and field2. All such duplicates are deleted without leaving a single row per set.

To keep (for example) the row with the smallest pkey per set of duplicates, use t2.pkey < t2.pkey.

An EXISTS semi-join (as @wilplasser already hinted) might be a better choice, especially if multiple rows could be joined (a row can only be deleted once anyway):

DELETE FROM tbl
WHERE  field > 0
AND    EXISTS (
   SELECT 1
   FROM   tbl t2
   WHERE  t2.field  = tbl.field
   AND    t2.field2 = tbl.field2
   AND    t2.pkey  <> tbl.pkey 
   );

edited Dec 19 '22 at 05:47

answered Apr 30 '14 at 17:24

Erwin Brandstetter

605,456
145
1,078
1,228

That gets us close, but it only works if `t1` and `t2` are different. In my case, they are the same. Sorry if I wasn't clear, but I intended for `table` to be the table name and `t1`/`t2` to be aliases. Thanks for pointing out the syntax was different I guessed it was the same! – moinudin Apr 30 '14 at 18:32
Funny: the `AND t2.pkey <> t1.pkey` condition still will delete all satisfying tuples (except if no duplicates exist) Maybe you meant `AND t2.pkey < t1.pkey` ?? – wildplasser Apr 30 '14 at 20:38
@wildplasser: Well, yes. That's the purpose of the query as I guessed it for lack of definition. I added a few lines to clarify. – Erwin Brandstetter Apr 30 '14 at 20:49
We both could be right. (or wrong) the intention of the OP is still totally unclear, at least to me. – wildplasser Apr 30 '14 at 20:54
@wildplasser: Strictly speaking, your simplified version is the correct answer. It just doesn't make any sense, so I provided an answer that might make more sense. – Erwin Brandstetter Apr 30 '14 at 20:57
3

In redshift you are not allowed to alias the target table, ie you must code `DELETE FROM tbl1 USING ...` – Bohemian Jun 16 '17 at 00:48
The modern form is an excellent answer. The reason many people like me would do any join is because the table data requirements are not materialised yet i.e. join (insert sub query here) t2 on ... and so on. USING statement is an excellent choice, just need to materialise the table somewhere else beforehand, either a cte, temp or physical table. Thanks. – softdevlife Oct 17 '18 at 10:09

wildplasser · Accepted Answer · 2021-02-11T20:25:26.870

I don't understand the mysql syntax, but you probably want this:

DELETE FROM mytablet1
WHERE t1.field > 0
   -- don't need this self-join if {field,field2}
   -- are a candidate key for mytable
   -- (in that case, the exists-subquery would detect _exactly_ the
   -- same tuples as the ones to be deleted, which always succeeds)
-- AND EXISTS (
--     SELECT *
--     FROM mytable t2 
--     WHERE t1.field = t2.field
--     AND t1.field2  = t2.field2
--    )
    ;

Note: For testing purposes, you can replace the DELETE keyword by SELECT * or SELECT COUNT(*), and see which rows would be affected by the query.

I think this could be exactly what I need. Let me confirm! – moinudin Apr 30 '14 at 22:09 — moinudin, Apr 30 '14 at 22:09

Delete from table A joining on table A in Redshift

2 Answers2