16

I have a TSV file with no quote chars. Whenever a \t occurs in the data, it is always to separate columns, and never a part of a column value. Whenever a " occurs, it is always a part of a column value, and never to enclose column values.

I would like to read this CSV in Ruby but it gives me

/Users/.rvm/rubies/ruby-1.9.3-p545/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift': Illegal quoting in line 9506. (CSV::MalformedCSVError)

My code is:

CSV.foreach(input_file, { :col_sep => "\t", :headers => true}) do |row|
   puts row
end

Any way to get around this problem?

Popcorn
  • 5,188
  • 12
  • 54
  • 87

2 Answers2

31

Turns out I could fix it by putting quote_char => "\x00" to trick it into thinking the zero byte is the quote char.

Popcorn
  • 5,188
  • 12
  • 54
  • 87
3

The liberal_parsing option is available for cases like this. From the documentation:

When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.

In your example this would be:

CSV.foreach(input_file, { :col_sep => "\t", :headers => true, :liberal_parsing => true }) do |row|
  puts row
end
Will Madden
  • 6,477
  • 5
  • 28
  • 20