I've seen a couple posts for this with no real answers or out-of-date answers, so I'm wondering if there are any new solutions. I have an enormous CSV I need to read in. I can't call open() on it bc it kills my server. I have no choice but to use .foreach().
Doing it this way, my script will take 6 days to run. I want to see if I can cut that down by using Threads and splitting the task in two or four. So one thread reads lines 1-n and one thread simultaneously will read lines n+1-end.
So I need to be able to only read in the last half of the file in one thread (and later if I split it into more threads, just a specific line through a specific line).
Is there anyway in Ruby to do this? Can this start at a certain row?
CSV.foreach(FULL_FACT_SHEET_CSV_PATH) do |trial|
EDIT: Just to give an idea of what one of my threads looks like:
threads << Thread.new {
CSV.open('matches_thread3.csv', 'wb') do |output_csv|
output_csv << HEADER
count = 1
index = 0
CSV.foreach(CSV_PATH) do |trial|
index += 1
if index > 120000
break if index > 180000
#do stuff
end
end
end
}
But as you can see, it has to iterate the file until it gets to record 120,000 before it starts. So the goal would be to eliminate reading all of the rows before row 120,000 by starting to read at row 120,000.