-1

TL;DR: How to get the raw input line (not line number) while parsing a csv file?

I'm parsing a delimited file with Ruby's CSV class. I'd like to retrieve the raw line from the file for each row, in addition to the parsed fields from that row.

Here is what I have now:

    CSV.foreach(input_file, csv_params) do |row|
      add_uploaded_user(row)
    end

That works perfectly. Every file is parsed correctly, and add_uploaded_user does what it is supposed to.

We are getting some unusual files from one client, with unexpected user names in the data. The file is valid csv and parses correctly. They claim we are messing up their records, so we want to capture each raw line from the file before it is parsed. We already save the whole CSV file, but it is inconvenient to manually pull the file and find the source record when we get a complaint. We'd like to give them a tool so they can verify exactly what they sent us. Also, we cannot reveal other records from that file the user in question, so we cannot share the entire file.

So, we'd like to capture the raw line of input with each parsed record we create from their file. Something like this:

    CSV.foreach(input_file, csv_params) do |row|
      add_uploaded_user(row, row.raw_line)
    end

...where raw_line is some method/attribute/helper from CSV that reveals the line that was just parsed.

I've gone through the CSV docs, and found https://ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/CSV.html#method-i-line :

  • line() - The last row read from this file.

But I can't figure out how to call line(). I've tried several invocations, and they all turn out pretty much the same, with NoMethodError: undefined method 'line' for CSV:Class :

irb(main):022:0> CSV.line
NoMethodError: undefined method 'line' for CSV:Class


irb(main):049:0* csv = CSV.new("a,b,c\n1,2,3\n")
=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
irb(main):050:0> csv.each do |row|
irb(main):051:1*   puts row
irb(main):052:1>   puts csv.line
irb(main):053:1> end
a
b
c
NoMethodError: undefined method 'line' for #<CSV:0x00007feeb25de3c0>
  from (irb):52:in 'block in irb_binding'
  from (irb):50
irb(main):054:0>

And a simpler example, reading an actual file:

irb(main):055:0> csv = CSV.new(File.open('3_licenses.csv'))
=> <#CSV io_type:File io_path:"3_licenses.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\r\n" quote_char:"\"">
irb(main):062:0> csv.shift
=> ["first_name", "last_name", "license_number"]
irb(main):063:0> csv.shift
=> ["David ", "Hempy", "1001"]
irb(main):064:0> csv.line
NoMethodError: undefined method 'line' for #<CSV:0x00007feeb2591020>
  from (irb):64
irb(main):065:0> csv.shift
=> ["Santa", "Claus", "np.1"]

UPDATE:

The docs I was reading was for 2.6. I'm running ruby 2.4.5, but it looks like it was there then, as well: https://ruby-doc.com/stdlib-2.4.5/libdoc/csv/rdoc/CSV.html#method-i-line . Interestingly, .line is not mentioned in https://docs.ruby-lang.org/en/2.4.0/CSV.html Hmm....

Also, I don't need the line number -- I need the raw line from the input file.

At this point, I'm about ready to just read the lines myself, then call CSV separately for each line. That will certainly work and put me in control...but I'm still confused why I can't call the .line() method described in the docs. If anyone can see why I'm getting "undefined method 'line'", I'd surely appreciate it.

David Hempy
  • 5,373
  • 2
  • 40
  • 68
  • Maybe you are looking this https://ruby-doc.org/core-2.7.0/Enumerable.html#method-i-each_with_index? See also: https://stackoverflow.com/questions/12407035/ruby-csv-get-current-line-row-number – iGian May 21 '20 at 18:14
  • Look at the [documentation](https://ruby-doc.org/stdlib-2.7.1/libdoc/csv/rdoc/CSV.html) for CSV. It has many easy to follow examples that cover many different use-cases. – the Tin Man May 21 '20 at 19:36
  • Your question isn't asked well. See "[ask]", "[Stack Overflow question checklist](https://meta.stackoverflow.com/questions/260648)" and "[MCVE](https://stackoverflow.com/help/minimal-reproducible-example)" and all their linked pages. We need a tiny example of the CSV input, the code rewritten to access that input, and your expected output. We have no idea what `add_uploaded_user` is, we can't run your code, nor without input or your expected output, can we tell what works or meets your needs. – the Tin Man May 21 '20 at 22:39
  • I guess I wasn't clear in my intent...the parsing of the file works fine, and add_upload_user() is irrelevant...let's pretend I called it "foobar()" instead of "add_upload_user". The focus of my question is only if it is possible to see the raw input line, in addition to the array of parsed fields. We want that so that when a customer calls foul months later, we can show them the exact source they sent us in the csv file, for only that single record. (I'll edit the question to make that more clear.) – David Hempy May 22 '20 at 03:42

1 Answers1

1

When the documentation refers to CSV#line they mean you have to call it on an instance of CSV:

require 'csv'

csv = CSV.new(File.open('example.csv'))

csv.each do |row|
  p csv.line
end
tadman
  • 208,517
  • 23
  • 234
  • 262
  • Thanks, tadman. I copied your code, and tried some variants of it, and I am still getting, ` NoMethodError: undefined method `line' for #`. Please see the last code block I just added to the question for an example. Any additional help is welcome. – David Hempy May 22 '20 at 04:19
  • 1
    This worked in Ruby 2.7. You may need to upgrade if you want to use it, or find a CSV gem for 2.4. – tadman May 22 '20 at 04:21