How to write a Rake task that imports data and handles deletions?

Question

I want to do the same thing as explained in this question How to write Rake task to import data to Rails app?.

However, i am not satisfied with the accepted answer because it does not consider deleted items in the source.

What is the simplest, most rails conform way to go about this with considering deleted entries in the source?

Notes:

When using .find_or_initialize_by_identifier and never deleting, excess entries remain in the table.
When using .delete_all before each import, as far as i know, the primary key is not reset and approaches its limit quickly.
I could drop the table and use ::Migrations.create_table in the rake task but then the definitions in the schema and migrations must be kept in sync with the code in the rake task, which seems undesirable.

This may give you a way: http://tech-brains.blogspot.in/2012/12/how-to-populate-database-using-data.html — RAJ, Apr 10 '15 at 10:10

Max Williams · Accepted Answer · 2015-04-10T09:31:52.213

You definitely should not delete all the records and then recreate them all from the data. This will create all sorts of problems, eg breaking any foreign key fields in other tables, which used to point to the object before it was deleted. It's like knocking a house down and rebuilding it in order to have a different coloured door. So, the "see if it's there, if it is then update it (if it's different), if it's not then create it" is the right strategy to use.

You don't say what your criteria for deletion are, but if it is "any record which isn't mentioned in the import data should be deleted" then you just need to keep track of some unique field from your input data and then delete all records whose own unique field isn't in that list.

So, your code to do the import could look something like this (copying the code from the other question: this code sets the data in a horribly clunky way but i'm not going to address that here)

namespace :data do
  desc "import data from files to database"
  task :import => :environment do
    file = File.open(<file to import>)
    identifiers = []
    file.each do |line|
      #disclaimer: this way of setting the data from attrs[0], attrs[1] etc is crappy and fragile and is not how i would do it
      attrs = line.split(":")
      identifier = attrs[0]
      identifiers << identifier
      if p = Product.find_or_initialize_by_identifier(identifier)
        p.name = attrs[1]
        etc...
        p.save!
      end
    end
    #destroy any which didn't appear in the import data
    Product.where("identifier not in (?)", identifiers).each(&:destroy)
  end
end

score 0 · Answer 2 · answered Apr 22 '15 at 19:53

what i went with is using .delete_all and a table schema without rails' default id auto_increment column to avoid growing values after .delete_all.

create_table :airport_locations, id: false do |t|
  t.string :iata_faa_code, :primary_key
  t.float :latitude
  t.float :longitude
end
add_index :airport_locations, :iata_faa_code

notes

the dataset is rather small (~5000 entries) and updates happen infrequently.
tracking deleted items like explained in Max Williams answer is doable if the table is small. though tables with several thousand entries would probably require a lot of memory and more complex strategies (for example using temporary tables) for finding deleted entries could become necessary.

How to write a Rake task that imports data and handles deletions?

2 Answers2