7

Given the following CSV file, how would you remove all rows that contain the word 'true' in the column 'foo'?

Date,foo,bar
2014/10/31,true,derp
2014/10/31,false,derp

I have a working solution, however it requires making a secondary CSV object csv_no_foo

@csv = CSV.read(@csvfile, headers: true) #http://bit.ly/1mSlqfA
@headers = CSV.open(@csvfile,'r', :headers => true).read.headers

# Make a new CSV
@csv_no_foo = CSV.new(@headers)

@csv.each do |row|
  # puts row[5]
  if row[@headersHash['foo']] == 'false'
    @csv_no_foo.add_row(row)
  else
    puts "not pushing row #{row}"
  end
end

Ideally, I would just remove the offending row from the CSV like so:

...
 if row[@headersHash['foo']] == 'false'
    @csv.delete(true) #Doesn't work
...

Looking at the ruby documentation, it looks like the row class has a delete_if function. I'm confused on the syntax that that function requires. Is there a way to remove the row without making a new csv object?

http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Row.html#method-i-each

spuder
  • 17,437
  • 19
  • 87
  • 153
  • Are you sure you must use ruby? I'm thinking `awk` might work too – Jared Beck Nov 03 '14 at 03:55
  • I take it back :) `awk` is a bad choice because `,` can be a delimiter or part of a quoted value. – Jared Beck Nov 03 '14 at 04:00
  • Rewriting the CSV while removing the offending rows is the solution. You're trying to remove a sequence of bytes from the middle of a file with variable length records, the usual way to do that is to copy the file and filter it along the way. – mu is too short Nov 03 '14 at 05:51
  • Thanks, though I'm not sure I understand. How do I rewrite the CSV? Do you mean rewrite to the disk? I still have more operations to do before writing to disk, and I'd like to avoid reading in the CSV twice. – spuder Nov 03 '14 at 06:12

2 Answers2

16

You should be able to use CSV::Table#delete_if, but you need to use CSV::table instead of CSV::read, because the former will give you a CSV::Table object, whereas the latter results in an Array of Arrays. Be aware that this setting will also convert the headers to symbols.

table = CSV.table(@csvfile)

table.delete_if do |row|
  row[:foo] == 'true'
end

File.open(@csvfile, 'w') do |f|
  f.write(table.to_csv)
end
Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
  • I'm using this approach, and if the deleted row is the last row, in the table then the entire CSV's contents are deleted - in other words, after all rows are removed, the headers are also removed. Does anyone know how to prevent the headers from being removed? – sealocal Dec 03 '15 at 10:12
  • @sealocal maybe it helps to add the option `write_headers: true` when calling `to_csv` (can't test it right now). – Patrick Oscity Dec 03 '15 at 12:45
  • Hmm, I tried that, without success. I'm not sure if I'm using it wrong, though. I decided to check if the new file is has less than 2 rows, then re-write the file again as a CSV with one row - the headers row. – sealocal Dec 03 '15 at 18:16
1

You might want to filter rows in a ruby manner:

require 'csv' 
csv = CSV.parse(File.read(@csvfile), {
  :col_sep => ",", 
  :headers => true
  }
).collect { |item| item[:foo] != 'true' }

Hope it help.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160