Schema
So you have a table with versioned objects which holds change records associated to the object with some details and a date.
Now you want to select
- the first change per object
- the oldest (within the GROUP of this object's changes)
- using
MIN
function on a DATE
column-type
This oldest should be retained/kept and stay. All other object change-versions should be deleted.
Solving
A. Selecting the fist/oldest changes per object in 2 steps.
- Select the MIN(date) per object:
SELECT Object_ID, COUNT(Object_ID) AS Count_Changes, MIN(Date_of_Change) AS First_Change
FROM table
GROUP BY Object_ID
Resultset contains each object with the total count of changes and the date of the first change.
- Select the first changes using previous result as subquery in a
JOIN
:
SELECT *
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
WHERE Current_Step is NULL and Change = 'change'
This are the rows to keep and not remove. The first change of each object should be retained and not cleaned.
B. Now we can invert the JOIN-condition to get all the rows, that we want to delete/clean:
- Change the date-comparison
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change = m.First_Change
to not-equal:
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
- Run a dry-select first, to get at least the count before deleting.
SELECT COUNT(Object_ID) AS records_to_remove
FROM table t
-- join with a table-subquery having only 2 columns to correlate
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE Current_Step is NULL and Change = 'change'
- Prepare the DELETE statement with JOIN (if supported by DBMS):
DELETE FROM table t
JOIN (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table
WHERE Current_Step is NULL and Change = 'change'
GROUP BY Object_ID
) m ON t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change
WHERE t.Current_Step is NULL AND t.Change = 'change'
Alternative to JOIN try USING
on other DBMS
Some DBMS do not support JOIN in DELETE statements, but alternatives like USING
:
DELETE FROM table t
USING (
SELECT Object_ID, MIN(Date_of_Change) AS First_Change
FROM table t2
WHERE t2.Current_Step is NULL AND t2.Change = 'change'
) AS m
WHERE ...
AND t.Object_ID = m.Object_ID AND t.Date_of_Change <> m.First_Change