4

I have a HABTM association between user and role.

User can be an admin (role_id = 1) or a user (role_id = 2) for roles.

In the join table, roles_users, I have some redundant records. For ex:

enter image description here

I want to remove the duplicate records such as 1:1, 2:4.

Two questions:

  1. Where's the best place to execute the sql script that removes the dups -- migration? script?

  2. What is the sql query to remove the dups?

keruilin
  • 16,782
  • 34
  • 108
  • 175

2 Answers2

12
CREATE TABLE roles_users2 LIKE roles_users; -- this ensures indexes are preserved
INSERT INTO roles_users2 SELECT DISTINCT * FROM roles_users; 
DROP TABLE roles_users;
RENAME TABLE roles_users2 TO roles_users;

and for the future, to prevent duplicate rows

ALTER TABLE roles_users ADD UNIQUE INDEX (role_id, user_id);

Or, you can do all of it in one step with ALTER TABLE IGNORE:

ALTER IGNORE TABLE roles_users ADD UNIQUE INDEX (role_id, user_id);

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

John Douthat
  • 40,711
  • 10
  • 69
  • 66
  • It's ALTER IGNORE TABLE instead of ALTER TABLE IGNORE. But IGNORE doesn't work all the time, better use @John Douthat first suggestion by creating a temporary table. – Jad B. Oct 31 '14 at 16:18
3

The simplest is to copy the data into a new table, minus the duplicates:

CREATE TABLE roles_users2 AS
SELECT DISTINCT * FROM roles_users

You can then choose one of the following:

  • Drop the old table, rename the new table to the old name and add indexes.
  • Truncate the old table and insert the rows from roles_users2 back into roles_users.
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Truncating the old table will keep constraints and foreign key references intact. Of course, if it had sensible constraints in the first place, there would be no duplicates. (shrug) – Mike Sherrill 'Cat Recall' Feb 13 '11 at 22:45