0
SELECT *
FROM epc e
INNER JOIN epc max_
ON e.ADDRESS1 = max_.ADDRESS1
   AND e.POSTCODE = max_.POSTCODE
   AND e.INSPECTION_DATE < max_.INSPECTION_DATE;

I have a table with 13 million rows. Without the AND e.INSPECTION_DATE < max_.INSPECTION_DATE it returns really quickly, but with it, very slow.

I've created index on INSPECTION_DATE and also a composite index on ADDRESS1, POSTCODE and INSPECTION_DATE to see if any of those would work, but it's still slow. I'm sure I left it one time and it returned "0 row(s) returned" but in sqlfiddle it seemed to remove duplicates and keep most recent confused.

My main objective is to delete all duplicate rows from the database where ADDRESS1 and POSTCODE are identical, while leaving the most recent row based on INSPECTION_DATE in the table.

Scoop
  • 67
  • 6
  • Edit your question, and show the schema of the table, including indexes. – Sloan Thrasher Dec 19 '18 at 15:15
  • 1
    Your entire question is centered about making a `SELECT` faster, then, in the last sentence, you change direction and say that you want to do a `DELETE`. What do you really want to do here? Regarding the select, indices may not work because you are selecting all columns. – Tim Biegeleisen Dec 19 '18 at 15:15
  • @TimBiegeleisen I wouldn't of thought SELECT and DELETE would have been any faster or slower? I was just trying to view the results before I committed to a DELETE. Thanks – Scoop Dec 19 '18 at 15:19
  • @TimBiegeleisen What I'm trying to do is the latter part of my question. Delete all duplicates from the table, leaving only the most recent one of the duplicates – Scoop Dec 19 '18 at 15:20
  • Please add the table definitions, and also let us know exactly which columns you need to select. If you are really trying to select dozens of columns, then most likely no single index would be able to effectively cover the query. – Tim Biegeleisen Dec 19 '18 at 15:48

0 Answers0