2

I'm building a Rails 3.2 app upon a legacy database which also has some broken records in different tables. One of the issues giving the most headache is that it includes invalid dates.

I've setup a sandbox which I manually fixed one time to get my code working. Now it's time for deployment. For this reason, the sandbox is reset every night and copied from the live database, ferret indexes are rebuilt, and migrations are re-applied. We are going to deploy to the sandbox often to get in the last fixes before deploying to the live setup.

As the legacy PHP app and this new Rails app need to run in parallel for a few weeks to months, we cannot simply one-time-fix the dates (Update: just for clarification, that means they run on the same database at the same time). I need a way to automate this, maybe with a migration or rake task (I'd go for the latter).

But the problem is: ActiveRecord chokes on loading such records so I have no way to investigate the record and fix the dates by some hardcoded assumptions made in ruby code.

A second problem is that the legacy database has inconsistencies because the PHP code did not use transactions and some code paths are broken and left orphans and broken table constraints behind. I will deal with that as they occur, most of them is already taken care of in the models. First problem goes with the dates.

How would you usually fix this? Maybe there's even some magic gem out there which supports migrating legacy databases with broken records by intercepting exceptions and running some try-to-fix code...

The migration path uses MySQL, and three production environments (stable with live database, staging with the same database, and sandbox with a database clone reset every night). We decided against doing a one-time data mapping / migration because we cannot replace the complete legacy application in one step (it consists of a CMS with about 50000 articles, hundreds of topics, huge file database with images and downloads, supporting about 10 websites, about 12 years of data and work, messy PHP code from different programming skills, duplicated code from different migration stages, pulling in RSS content from partner sites to mix articles/posts from there into the article timelines in our own application's topics, and a lot more fun stuff...

First step is to migrate the backend application to get a consistent admin and publishing interface. The legacy frontend applications still need to write to the database (comments and other content created by visitors). So the process of fixing the database must be able to run unattended on a regular basis.

We already have fixes in place that gracefully handle broken model dependencies in belongs_to and has_many. Paperclip integration has been designed to work with all the fantastic filename mappings invented. And the airbrake gem reports all application crashes to our redmine installation so we get a quick overview of all the left quirks.

The legacy applications have already been modified to work with the latest MySQL version and has been migrated to a current MySQL database server.

hurikhan77
  • 5,881
  • 3
  • 32
  • 47

4 Answers4

3

I had the same problem. The solution was to tell mysql2 not to perform casting, like this:

client.query(sql, cast: false).each do |row|
  row['some_date'] = Date.parse(row['some_date']) rescue(nil)
end

See mysql2 documentation for details on how to build client object. If required, access rails db config via ActiveRecord::Base.configurations.

Zubin
  • 9,422
  • 7
  • 48
  • 52
  • @hurikhan77 It solves the problem by telling mysql2 not to typecast the date, which allows us to rescue gracefully and set an invalid date to nil. How is that cumbersome? Please mark this as correct if it solves your problem, as it did for me. – Zubin Oct 21 '14 at 02:06
  • I will mark it as solved as soon as I get back to the project to test it. BTW, "cumbersome" was not meant negative, please read as "complicated compared to native AR usage". Especially, I'd like to not write SQL commands. I hope there is a way to stop typecasting within AR. – hurikhan77 Oct 22 '14 at 10:51
  • @hurikhan77 This works. I wrote a migration class that iterated through the records once and updated the 'bad' fields. Problem Solved and yes annoying to have to manually do. – blnc Nov 21 '14 at 18:12
  • thanks a lot for this solution. – JL M Mar 04 '22 at 10:39
0

Create a data import rake task that does all the conversions and fixes you need (including the data parsing and fixing), and run it every time you get a fresh update from the legacy app. The task can use raw SQL (look-up "execute" and "exec_query" methods), it doesn't have to work with models. This will be your magical "gem" that you were looking for. Obviously, you cannot have a one-fits-all tool for that, as every case of broken data is unique. But just don't create kludges in your new code base.

Slava Kravchenko
  • 377
  • 1
  • 10
  • Not sure if I understood your workflow. What I don't want is to import the data. The legacy app and the Rails app MUST use the same database at the same time because they run in parallel and we are doing a soft migration from legacy to modern step by step. Working with "execute" and "exec_query" probably would work - but in my opinion exactly that is the kludge. I should use my models to access broken data and let it fix itself by logic implemented in the model. – hurikhan77 May 06 '13 at 10:18
  • It wasn't clear from your original message that the both applications have to share the same database. Or I misunderstood something. If so, then my solution will not work for you, apparently. But I strongly object to having models "fix broken data by logic implemented in them". The logic that your model contain should be about business logic, not some hacks and workarounds for the broken data. Because those kludges will pile up, pollute your business logic and eventually make the code hard to understand. I suggest that you don't go that way :-) – Slava Kravchenko May 06 '13 at 13:35
  • Actually, it is contained in the original message: "As the legacy PHP app and this new Rails app need to run in parallel for a few weeks to months, we cannot simply one-time-fix the dates" - but I amended that so it becomes more clear. The logic in my models is designed to be graceful with errors from the old DB (we can very well guess with assumptions in the code what information is missing and insert it on the fly), the same goes for dates (actually a date like offline_at 0000-00-00 16:30 means be offline at the same day as online_at states, the old PHP code handles such cases in code). – hurikhan77 May 06 '13 at 14:27
  • When migration progresses and we can turn off old PHP code, we will remove the clutter from the models and write a one time migration to fix it all and forever. But currently we can't as legacy code can still insert or generate broken data... Most of this clutter is separated out into modules and reusable in multiple models. – hurikhan77 May 06 '13 at 14:27
  • Meanwhile I managed to create a rake task with some obscure SQL to fix the missing date details. That works for the moment. However, it's not unattended, and I suppose we find more such inconsistencies. – hurikhan77 May 06 '13 at 19:32
0

Similar to: Rails: How to handle existing invalid dates in database? and also without correct answer so I repost my solution below.

I think the simplest solution that worked for me was to set in database.yml file write cast: false, e.g. for development section

development
  <<: *default
  adapter: mysql2    
  (... some other settings ...)
  cast: false
Community
  • 1
  • 1
bartgras
  • 442
  • 3
  • 13
-1

I believe it will solve your problem Date.parse()

e.g. Date.parse(foo.created_at)

Lian
  • 1,597
  • 3
  • 17
  • 30
  • Nope, won't actually. Article.find(article_with_broken_date) will throw the exception, so there is no chance to fix the date and save the record again. – hurikhan77 May 06 '13 at 10:15