0
def self.import file, organization
  counter = 0
  CSV.foreach(file.path, encoding: 'windows-1251:utf-8', headers: true) do |row|
    name = (row["First Name"].to_s + " " + row["Last Name"].to_s).titleize
    customer = Customer.create(
      name: name,
      phone: row["Main Phone"],
      email: row["Main Email"],
      address: row["Address"],
      repair_shop: repair_shop
    )
    puts "#{name} - #{customer.errors.full_messages.join(',')}" if   customer.errors.any?
    counter += 1 if customer.persisted?
  end
  message = "Imported #{counter} users."
end

This is the code I have so far. I'm importing files with 10,000 rows, so it overwhelms my production server in processing.

How could I do this in batches?

wOxxOm
  • 65,848
  • 11
  • 132
  • 136
alejorivera
  • 935
  • 1
  • 10
  • 25
  • 2
    You should consider using transactions. It will still import all of your rows in succession, but the database won't have to rebuild the indexes for every row inserted. http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html – Alexa Y Sep 29 '15 at 15:35
  • This could help you http://ruby-journal.com/how-to-import-millions-records-via-activerecord-within-minutes-not-hours/ – Pavan Sep 29 '15 at 15:38
  • @BenY the indexes still get modified, but what makes a single commit faster (as well as easier to recover from in the event of a problem) is that each call to commit has to wait for all data changes to be physically written to the write-ahead log. One possible work around is asynchronous commit, but only committing at the end of the process is generally better practice anyway – David Aldridge Sep 29 '15 at 15:43
  • Thanks, great points @BenY and David. I'm not doing a transaction because I still want to process valid data and let invalid data silently fail. – alejorivera Sep 29 '15 at 16:04
  • Thanks @Pavan, that looks like it will help. I'll try implementing it. – alejorivera Sep 29 '15 at 16:04

1 Answers1

0

Taken from https://satishonrails.wordpress.com/2007/07/18/how-to-import-csv-file-in-rails/

Simply add a periodic explicit garbage collection:

def self.import file, organization
  counter = 0
  CSV.foreach(file.path, encoding: 'windows-1251:utf-8', headers: true).with_index do |row, i|
    name = (row["First Name"].to_s + " " + row["Last Name"].to_s).titleize
    customer = Customer.create(
      name: name,
      phone: row["Main Phone"],
      email: row["Main Email"],
      address: row["Address"],
      repair_shop: repair_shop
    )
    puts "#{name} - #{customer.errors.full_messages.join(',')}" if   customer.errors.any?
    counter += 1 if customer.persisted?
    GC.start if i % 100 == 0 # forcing garbage collection
  end
  message = "Imported #{counter} users."
end

This way you will guarantee that your server will not run out of memory. I have checked it in practice, it really worked.

prograils
  • 2,248
  • 1
  • 28
  • 45