14

Sometimes, data migrations are required. As time passes, code changes and migrations using your domain model are no longer valid and migrations fail. What are the best practices for migrating data?

I tried make an example to clarify the problem:

Consider this. You have a migration

class ChangeFromPartnerAppliedToAppliedAt < ActiveRecord::Migration
  def up
    User.all.each do |user|
      user.applied_at = user.partner_application_at
      user.save
   end
 end

this runs perfectly fine, of course. Later, you need a schema change

class AddAcceptanceConfirmedAt < ActiveRecord::Migration
  def change
    add_column :users, :acceptance_confirmed_at, :datetime
  end
end

class User < ActiveRecord::Base
  before_save :do_something_with_acceptance_confirmed_at
end

For you, no problem. It runs perfectly. But if your coworker pulls both these today, not having run the first migration yet, he'll get this error on running the first migration:

rake aborted!
An error has occurred, this and all later migrations canceled:
undefined method `acceptance_confirmed_at=' for #<User:0x007f85902346d8>

That's not being a team player, he'll be fixing the bug you introduced. What should you have done?

oma
  • 38,642
  • 11
  • 71
  • 99

3 Answers3

14

This is a perfect example of the Using Models in Your Migrations

class ChangeFromPartnerAppliedToAppliedAt < ActiveRecord::Migration
  class User < ActiveRecord::Base
  end

  def up
    User.all.each do |user|
      user.applied_at = user.partner_application_at
      user.save
   end
 end

Edited after Mischa's comment

class ChangeFromPartnerAppliedToAppliedAt < ActiveRecord::Migration
  class User < ActiveRecord::Base
  end

  def up
    User.update_all('applied_at = partner_application_at')
  end
 end
TheGeorgeous
  • 3,927
  • 2
  • 20
  • 33
Salil
  • 46,566
  • 21
  • 122
  • 156
  • Why there is -ve marking, will some one bother to comment why it's not correct before marked answer to negative? – Salil Oct 24 '12 at 08:36
  • fabulous answer Salil. Only ignorants would downvote this. I'm just annoyed I didn't think of this solution myself. Brilliant! :) – oma Oct 24 '12 at 08:41
  • 2
    Whoever downvoted cancelled it already. No worries! And obviously this is the correct answer. Only one remark @oma: I would recommend using `User.update_all('applied_at = partner_application_at')` instead of `User.all.each` etc. – Mischa Oct 24 '12 at 08:42
  • If you do it the way Misha described then you don't need to declare the User class. But if you're doing the way Misha said then you can also do it directly in SQL. or you can use the #reset_column_information method from ActiveRecord::Base and do the first example without declaring the model. http://guides.rubyonrails.org/migrations.html#using-models-in-your-migrations – ChuckE Oct 24 '12 at 13:56
  • This approach has pitfalls and seems not correct because of duplicated classes definition. If the real classes are changed, for example their associations this code won't work. There is a one more problem with polymorphic associations - the classes are defined in the scope of ChangeFromPartnerAppliedToAppliedAt in this example. So the polymorphic association with some other class won't be working. The solution here - write test for migrations. I've created a gem for this problem. Check it out - https://github.com/ka8725/migration_data – ka8725 Jan 29 '14 at 10:23
  • 1
    So the link to RailsGuides is dead because that section has been removed from the docs. Here is the PR that removed the section in case you want to read it (like I did) https://github.com/rails/rails/pull/14208/files?short_path=0ffb86f#diff-0ffb86fcbcebd97b9d77338e08102907 – CamelBlues Apr 01 '15 at 16:19
13

Best practice is: don't use models in migrations. Migrations change the way AR maps, so do not use them at all. Do it all with SQL. This way it will always work.

This:

User.all.each do |user|
  user.applied_at = user.partner_application_at
  user.save
end

I would do like this

update "UPDATE users SET applied_at=partner_application_at"
ChuckE
  • 5,610
  • 4
  • 31
  • 59
  • Seems cleaner than what I was going to suggest, where you get really careful after added/changed columns, and any hooks involving them are wrapped in `begin-rescue-end` blocks. –  Oct 24 '12 at 08:36
  • 2
    Why is it best practice not to use models in migrations? Any sources to back that up? Salil's answer seems to contradict your statement. – Mischa Oct 24 '12 at 08:49
  • 2
    It is best practice because inherent changes to schema change the way AR maps attributes. That is, you may be using certain getters/setters/methods in you model implementation which will eventually be updated/removed. That means, if you run migrations from start to finish, they will break. You'll always be dependent of having a stable schema. And of course, my main argument: migrations work on the database, so they should speak in a language understood there. even rails DSL for migrations (create_table, and all that) in the end is translated into raw SQL. Allowing AR models to be instantiated – ChuckE Oct 24 '12 at 08:55
  • in the migrations is just one more Rails/AR contradiction. – ChuckE Oct 24 '12 at 08:56
  • 2
    Fine, that's your opinion. No problems with that. But if it's a best practice, as you claim, there must be some sources to back it up. Preferably by Rails-core members. – Mischa Oct 24 '12 at 09:07
  • One more thing: they don't say it is bad practice, yet they provide a "quick fix" (the reset_column_information, which by the way, is the answer to this question) to mitigate its effects. But just because it is possible, it doesn't mean it is inherently correct. They provide a DSL for creating, updating tables, add columns, etc, but none to handle data. For this they put the MySqlAdapter connection methods at your disposal in the migration scope. – ChuckE Oct 24 '12 at 13:53
  • 1
    ChuckE, you are right. Using raw SQL in migrations is the most robust solution to keep this code up to date. Recently I've invented simple solution which may be easier. I described it in my blog post. Please, check it out - http://railsguides.net/2014/01/30/change-data-in-migrations-like-a-boss/ – ka8725 Feb 04 '14 at 09:07
  • ChuckE @Mischa ka8725 It's been a while and I must admit that I lean more towards the raw SQL approach and would recommend that. I think the "what if you suddenly change DB" arguments to be pretty lame - it's not realistic and you'd do it for a reason requiring you to change loads of code too. What I see more of is the dynamics of the app and how the model changes, you may even split up models, or join them, so I recommend raw SQL :) Thanks guys, I appreciate your thoughts and insights, you rock ! – oma Jun 22 '14 at 11:33
0

Some times 'migrating data' could not be performed as a part of schema migration, like discussed above. Sometimes 'migrating data' means 'fix historical data inconstancies' or 'update your Solr/Elasticsearch' index, so its a complex task. For these kind of tasks, check out this gem https://github.com/OffgridElectric/rails-data-migrations

This gem was designed to decouple Rails schema migrations from data migrations, so it wont cause downtimes at deploy time and make it easy to manage in overall

serggl
  • 306
  • 3
  • 8
  • intersting. Often data migrations happen because of schema migration. Often, not always, I agree we also have to occasionally fix data, albeit that's more than likely related to a schema migration designed to prevent more of the same bad data. So I don't really think this is the way to go, any data migration is directly linked to a schema version in order for it to be runnable. thoughts? – oma Nov 16 '16 at 11:07