1

I have a ruby script that's going to take about 5 hours to crunch on my specced-out rmbp. Problem: We need it within 2 hours.

The code being run is non-threadsafe, and spits out a csv file from a xlsx input. ...I've never used server farms before, but I'm guessing non-threadsafe ruby isn't exactly their thing(?)

In short, is there any sort of server farm or service or any way at all I can crunch a ruby script that takes 5 hours in less than an hour or two?

Charles
  • 50,943
  • 13
  • 104
  • 142
PlankTon
  • 12,443
  • 16
  • 84
  • 153
  • By "rmbp", do you mean a "MacBook Pro with Retina Display"? A laptop is not going to run as fast as a desktop model; They trade off performance for portability. Have you run a [profiler](https://github.com/rdp/ruby-prof) to see where the most-used blocks are, and does it make sense that they are slow? – the Tin Man Dec 25 '12 at 06:37
  • Hi Tin Man, thanks for the response. The script itself is efficient. Being on holidays, I only have access to relatives' desktops, which are slower than my laptop. In the long term, I'm really looking for parallel options beyond what a standard desktop computer can do. – PlankTon Dec 25 '12 at 08:31
  • 3
    You really need to split up the task into more sub-tasks. When you run a single ruby process you will run it on a single core. Even if you multithread it. If you could split up the task into 4 sub-tasks, and run them in separate processes for example you'd already get a significant performance boost because of the multi-core benefit. – Casper Dec 25 '12 at 08:49

1 Answers1

2

I've done something similar using Heroku and Sidekiq. Heroku provides both free and mini-plans for low-cost computing, and the Sidekiq gem lets you chunk out your work to several workers, letting them run simultaneously, thus finishing faster.

Do you know where your bottleneck is? Is it the input of the XLSX file or the output to CSV? If it's the latter, bring in your XLSV file, and chunk bits of it out to Sidekiq workers.

For example, you could split out your XLSX file into chunks of 1000 rows, then

rows.each do |row|
  MySidekiqWorker.perform_async(row)
end

And your MySidekiqWorker might look like:

class MySidekiqWorker
  include Sidekiq::Worker

  def perform(row)
    # append processed data to CSV
  end
end

Your workers do all the heavy lifting and will almost certainly be faster than your laptop without such help.

jbnunn
  • 6,161
  • 4
  • 40
  • 65