0

I have a model, Clients and a corresponding database with lastname and firstname columns. Originally there were no constraints on the uniqueness of [lastname, firstname], and the database currently contains duplicates. I would like to clean up the database and impose constraints on the model, such as: validates_uniqueness_of :lastname, scope: :firstname.

Idea that comes to my mind is to back up the data in some fashion, impose constraints on an empty model database and then pull the data back in with duplicates that I can now process separately rescueing from exception.

I feel, however, that I am doing something off-the-wall here.

Is there a better, "rails way" to do this?

Alexei Danchenkov
  • 2,011
  • 5
  • 34
  • 49

1 Answers1

1

The only really pure-Rails way to discover the problems is to run through every model and ensure that it is still valid. For instance, roughly:

Client.all.each do |client|
  unless (client.valid?)
    puts "Client #{client.id} invalid: #{client.errors.full_messages}"
  end
end

Loading all records might be a bad idea if it requires too much memory. ActiveRecord 3.0 is supposed to be smarter about this, loading it in chunks, but I can't prove this is the case at the moment.

As for what you do with the duplicate data:

  • Always back up your tables before starting using the appropriate database snapshotting tool.
  • Always test your modifications on a copy of the data before running this on a production database.
  • Always document your changes by writing a Rails migration that performs the operation in a reliable and predicable manner. Test it repeatedly before deploying.

I would presume that your production database is subjected to regular snapshots as it is, in which case you can grab test data from there. If this is not the case, your first priority should be to ensure it is.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • I still wonder how to cleanup the invalid records, though. `client.destroy` would not work as client is invalid, I presume. Solved this by adding an extra field 'dupe' and changing uniqueness validation to: `validates_uniqueness_of :lastname, scope: [:firstname, :dupe]`. `client.dupe = true; client.destroy` then worked (naturally). – Alexei Danchenkov Jul 04 '12 at 17:45
  • 1
    Use Client.find_each if you want to iterate in batches – Frederick Cheung Jul 04 '12 at 19:34