0

I need to parse a user inputted CSV file and upload pieces of said CSV file. I would like to take the first 5 rows from the table and remove them in the process.

I'm checking out the Ruby CSV docs and I can't seem to find anything that would meet my needs.

I'm looking for something like the Hash delete method but for CSVs. Does this exist for Ruby?

CSV.foreach(csv_path, headers: false).take(5)

This is almost want I want but I also need to delete those rows from the table in the process.

Bitwise
  • 8,021
  • 22
  • 70
  • 161
  • Its a bit unclear what you actually mean by "and remove them in the process" do you mean actually remove the lines from the file? – max Dec 29 '20 at 20:04
  • Yes. I want to first take those rows and do some processing with them and then after remove them from the file/table permanently so I can repeat this process until there are no rows left in the table. – Bitwise Dec 29 '20 at 20:05
  • So what you really want to do is process the file in chunks of 5? – max Dec 29 '20 at 20:07
  • @Bitwise so to be clear before I delete my answer. You want to open the file read the first 5 lines, save the rest back to the file, and repeat? – engineersmnky Dec 29 '20 at 20:07
  • @engineersmnky Exactly. – Bitwise Dec 29 '20 at 20:08
  • 1
    Can you explain the reasoning behind this process because that is IO intensive for reasons I cannot currently comprehend. Like @max said do you just want to read in chunks of 5 because that would not require the open/read/alter/save cycle – engineersmnky Dec 29 '20 at 20:09
  • You can use `CSV.foreach(csv_path, headers: false).each_slice(5) do |batch| ... end` see https://stackoverflow.com/questions/12407035/ruby-csv-get-current-line-row-number for how to get the row number. – max Dec 29 '20 at 20:10
  • I basically need to assign x amount of rows to a user in the system. The CSV file uploaded could have 10,000 rows of data inside. I need to take x amount of rows and upload that file to S3 for that particular user and keep going until I'm out of rows. It's a bit of a weird interaction but it's what I need. – Bitwise Dec 29 '20 at 20:13
  • 1
    So you do not need to delete them from the original you need to create new CSV files in chunks. Correct? If that is true then @max has provided the answer. just `CSV.foreach(...).each_slice(x)` will create an Enumerator that yields "x" number of rows to a given block. Then you just need to write those blocks to a new CSV – engineersmnky Dec 29 '20 at 20:14
  • Yes, I think that would suffice @engineersmnky – Bitwise Dec 29 '20 at 20:15
  • Feel free to answer it yourself. :) – max Dec 29 '20 at 20:16
  • Fair enough, fair enough ;) @max – Bitwise Dec 29 '20 at 20:18
  • Dang, `each_slice` won't work here because I don't know what `x` will be. For some of the CSV it could be `200` and then it could be `1000` after that. `each_slice` assumes all batches need to be sliced equally. – Bitwise Dec 29 '20 at 20:35
  • I've posted an answer but after reading through the comments I've concluded that I don't understand the question. For example, why do you wish to remove rows from the resulting table rather than just skip over them? Clarification is required (with an edit). An example would be helpful. – Cary Swoveland Dec 29 '20 at 20:46
  • @Bitwise how do you know when to break? – engineersmnky Dec 29 '20 at 21:24
  • It's a quantity defined on a record. – Bitwise Dec 29 '20 at 21:25

1 Answers1

0

Create a CSV file for illustration.

File.write('t.csv', <<~END
Now,is,the
time,for,all
good,Rubiests,to
come,to,the
aid,of,their
bowling,team,.
END
)

Let's look at it.

puts File.read('t.csv')
Now,is,the
time,for,all
good,Rubiests,to
come,to,the
aid,of,their
bowling,team,.

Then if rows whose indices are in the range rng are to be skipped,

require 'csv'
rng = 1..3
CSV.foreach('t.csv').reject.with_index { |_,i| rng.cover?(i) }
  #=> [["Now", "is", "the"],
  #    ["aid", "of", "their"],
  #    ["bowling", "team", "."]]

Use select rather than reject if desired.

More generally, if rows whose indices are in an array arr are to be skipped,

arr = [1, 3, 4]
CSV.foreach('t.csv').reject.with_index { |_,i| arr.include?(i) }
  #=> [["Now", "is", "the"],
  #    ["good", "Rubiests", "to"],
  #    ["bowling", "team", "."]]

One could also write the following.

arr = [1, 3, 4]
x, y = CSV.foreach('t.csv').partition.with_index { |_,i| arr.include?(i) }
x #=> [["time", "for", "all"], ["come", "to", "the"], ["aid", "of", "their"]] 
y #=> [["Now", "is", "the"], ["good", "Rubiests", "to"], ["bowling", "team", "."]] 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100