I have a very large table (dozens of millions of rows) and a UNIQUE index needs to be added to a column on that table. I know for a fact that the table does contain duplicated values on that key, which I need to clean up (by deleting rows/resetting the value of the column to something unique that I can automatically generate). A plus is that the rows which are already duplicated do not get modified anymore.
What would be the right approach to perform a change like this, given that I will be probably using the Percona pt-osc tool and there are continuous deletes/inserts on the table? My plan was:
- Add code that ensures no dupe IDs get inserted anymore. Probably I need to add a separate table for this temporarily, since I want the database to enforce this for me and not the application - so insert into the "shadow table" with a unique index in a transaction together with my main table, rollback all inserts that try to insert duplicate values
- Backfill the table by zapping all invalid column values which are within the primary key range below
$current_pkey_value
- Then add the index and use pt-osc to changeover the table
Is there anything I am missing?