0

So I found a lot of articles where people are having issues exporting big data into a CSV with rails. I'm able to do this, but it takes about 40 seconds per 20k rows.

Has anyone overcame this issue? I searched everywhere for the past couple hours and couldn't find something which worked for me.

Thanks!

R. Leo
  • 3
  • 5
  • 1
    So it's doing 500 rows a second? That seems ok to me. Couldn't you have just spent the last couple of hours simply doing something else, while it finished outputting the CSV? – Max Williams Oct 13 '15 at 14:34
  • If you do want to speed it up though, you could try to "eager load" all of your data up front, so you're not hitting the db again for every row. – Max Williams Oct 13 '15 at 14:37
  • How about generate it in multiple process? e.g. Process 1 generate record 1 to 100000, then save as file1.csv and Process 2 generate record 100001 to 200000 then save as file2.csv. After all process completed. use cat command to combine each sub file into your final file. – Calvin Oct 13 '15 at 14:37
  • Max, I don't need eager loading as it's only one table with no associations. Calvin, how would you go about generating multiple processes? – R. Leo Oct 13 '15 at 15:27

1 Answers1

1

Suppose you want to load 1k rows into CSV. You can write a rake task which accepts limit and offset to pull data from table. Then write a ruby script something like below

batch_size = 100
offset = 0
for i in 0..9
  nohup rake my_task:to_load_csv(batch_size, offset, index) > rake.out 2>&1 &
  offset += batch_size
end

** Refer this link to know more about how to run rake in background

rake task will be something like

namespace :my_task
  task :load_csv, [:limit, :offset, :index] :environments do
    # write code here load data from table using limit and offset
    # write the data returned in above query to FILE_NAME_#{index}.csv
  end
end

Once you see all rake task are finished combine all files by index. If you want to automate process of combining files, you need to write some code for process monitoring. You have to grep for all active rake tasks and store their PID in array. Then every 15 seconds or something try to get the status of process using PID from array. If process is no longer running pop the PID from array. Continue this until array is blank i.e all rakes are finished and then merge files by their index. Hopefully this helps you. Thanks!

Community
  • 1
  • 1
mandar.gokhale
  • 1,876
  • 1
  • 18
  • 37