3

I'm using FasterCSV on a Ruby on Rails application and currently it throws an Exception if the file is invalid.

I've looked over the FasterCSV doc, and it seems that if I use FasterCSV::parse with a block, it'll read the file one line at a time, without allocating too much memory. It'll throw a FasterCSV::MalformedCSV exception if there is any kind of error on the file.

I've implemented a custom solution, but I'm not sure it's the best possible one (see my answer below). I'd be interested in knowing alternatives

sren
  • 3,513
  • 4
  • 25
  • 28
kikito
  • 51,734
  • 32
  • 149
  • 189
  • 1
    Two changes: 1) You need `def self.is_valid?` 2) Remove the superfluous `return` calls. Posted as a comment instead of an answer because you already basically have this answer, and I don't want to discourage additional answers. I suggest that you should remove your own code from the question and post it as an answer after some time has passed. – Phrogz May 10 '11 at 16:58
  • Thanks for your comments! I've fixed my code (I prefer being explicit about return when it's not at the end of a method) – kikito May 10 '11 at 17:12
  • @egarcia How about `begin; parse{ ... }; true; rescue; false; end` – Phrogz May 10 '11 at 17:28
  • @Phrogz As a rubyist, I'm a big fan of one-liners. But ... please don't take it the wrong way, but I don't really like that semicolon business. On this particular case I think I'd rather be explicit about what is happening, even if the method spans several lines. – kikito May 11 '11 at 10:33
  • 1
    @egarcia To be clear, the semi-colons are only because I can't type newlines in comments. Anyhow, only a suggestion for what I consider to be more Ruby-like code. – Phrogz May 11 '11 at 13:24
  • @Phrogz ooh, I see what you mean now. You are right, it's more Ruby-like. I'm updating my answer. Thanks! – kikito May 11 '11 at 15:28

3 Answers3

1

This is my current solution. I'm really interested in knowing improvements / alternatives.

# /lib/fastercsv_is_valid.rb

class FasterCSV

  def self.is_valid?(file, options = {})
    begin
      FasterCSV.parse(file, options) { |row| }
      true
    rescue FasterCSV::MalformedCSV
      false
    end
  end

end

I use that method like this:

# /models/csv_importer.rb

class CsvImporter
  include ActiveRecord::Validations

  validates_presence_of :file
  validate check_file_format

...

  private

  def check_file_format
    errors.add :file, "Malformed CSV! Please check syntax" unless FasterCSV::is_valid? file
  end
end
kikito
  • 51,734
  • 32
  • 149
  • 189
0

I made some tests yesterday and it turns out that my solution didn't quite work; I kept getting empty arrays on valid CSVs after implementing the first is_valid . I'm not sure whether it's a FasterCSV caching issue or something in my code, and I don't know if it's related with my test setup, but I decided to go implement a safe_parse instead:

#/lib/faster_csv_safe_parse.rb
class FasterCSV

  def self.safe_parse(file, options = {})
    begin
      FasterCSV.parse(file, options)
    rescue FasterCSV::MalformedCSVError
      nil
    end
  end

end

This will return a parsed array if the file is valid, or nil otherwise. I could then implement my validations as follows:

# /models/csv_importer.rb

class CsvImporter
  include ActiveRecord::Validations

  validates_presence_of :file
  validate check_file_format
  attr_accessor csv_data

  def csv_data
    @csv_data ||= FasterCSV.safe_parse(file)
  end

...

  private

  def check_file_format
    errors.add :file, "Malformed CSV! Please check syntax" if csv_data.nil?
  end
end

I guess it would be possible to implement a safe_parse that accepts a block and parses the file line by line, but for my purposes this simple implementation was enough, and it works in all cases.

kikito
  • 51,734
  • 32
  • 149
  • 189
-1

I assume you want to parse the CSV and do something with the parsed results. Worst case is that your CSV is valid and that you parse the file again. I would write something like this to stash away the parsed result so you only have to parse the CSV once:

module FasterCSV

  def self.parse_and_validate(file, options = {})

    begin
      @parsed_result = FasterCSV.parse(file, options) { |row| }
    rescue FasterCSV::MalformedCSV
      @invalid = true
    end
  end

  def self.is_valid?
    !@invalid
  end    

  def self.parsed_result
    @parsed_result if self.valid?
  end

end

And then:

class CsvImporter
  include ActiveRecord::Validations

  validates_presence_of :file
  validate check_file_format

  # I assume you use the parsed result after the validations so in a before_save or something
  def do_your_parse_stuff
    here you would use FasterCSV::parsed_result
  end
...

  private

  def check_file_format
    FasterCSV::parse_and_validate(file)
    errors.add :file, "Malformed CSV! Please check syntax" unless FasterCSV::is_valid?
  end
end

In the above case, you might want to move stuff into a different class that takes care of communicating with FasterCSV and stashing away the parsed result, because I don't think my example is thread safe :)

xinit
  • 1,916
  • 15
  • 12
  • If I'm reading FasterCSV docs correctly, it does not FasterCSV::parse doesn't allocate any memory for the parsed result if you give it a block (hence the `{ |row| }`). On the other hand, invalid files, at least on my case, are detected very early; usually on the first 2 lines. I don't actually mind the speed loss of parsing the file twice on this particular case. But thread safety is important. – kikito May 11 '11 at 10:26
  • Downvoted because your code sets instance variables on the class. If you have two CSV files, validating the second would destroy any ability to check the validity of the first. – Phrogz May 11 '11 at 16:10
  • @Phrogz You are right.. I kind of hinted at this with my last line. – xinit May 12 '11 at 08:42
  • @egarcia I wasn't afraid of memory usage, but with large or many files, you might notice. – xinit May 12 '11 at 08:43