TL;DR: How to get the raw input line (not line number) while parsing a csv file?
I'm parsing a delimited file with Ruby's CSV class. I'd like to retrieve the raw line from the file for each row, in addition to the parsed fields from that row.
Here is what I have now:
CSV.foreach(input_file, csv_params) do |row|
add_uploaded_user(row)
end
That works perfectly. Every file is parsed correctly, and add_uploaded_user does what it is supposed to.
We are getting some unusual files from one client, with unexpected user names in the data. The file is valid csv and parses correctly. They claim we are messing up their records, so we want to capture each raw line from the file before it is parsed. We already save the whole CSV file, but it is inconvenient to manually pull the file and find the source record when we get a complaint. We'd like to give them a tool so they can verify exactly what they sent us. Also, we cannot reveal other records from that file the user in question, so we cannot share the entire file.
So, we'd like to capture the raw line of input with each parsed record we create from their file. Something like this:
CSV.foreach(input_file, csv_params) do |row|
add_uploaded_user(row, row.raw_line)
end
...where raw_line
is some method/attribute/helper from CSV that reveals the line that was just parsed.
I've gone through the CSV docs, and found https://ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/CSV.html#method-i-line :
line()
- The last row read from this file.
But I can't figure out how to call line()
. I've tried several invocations, and they all turn out pretty much the same, with NoMethodError: undefined method 'line' for CSV:Class
:
irb(main):022:0> CSV.line
NoMethodError: undefined method 'line' for CSV:Class
irb(main):049:0* csv = CSV.new("a,b,c\n1,2,3\n")
=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
irb(main):050:0> csv.each do |row|
irb(main):051:1* puts row
irb(main):052:1> puts csv.line
irb(main):053:1> end
a
b
c
NoMethodError: undefined method 'line' for #<CSV:0x00007feeb25de3c0>
from (irb):52:in 'block in irb_binding'
from (irb):50
irb(main):054:0>
And a simpler example, reading an actual file:
irb(main):055:0> csv = CSV.new(File.open('3_licenses.csv'))
=> <#CSV io_type:File io_path:"3_licenses.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\r\n" quote_char:"\"">
irb(main):062:0> csv.shift
=> ["first_name", "last_name", "license_number"]
irb(main):063:0> csv.shift
=> ["David ", "Hempy", "1001"]
irb(main):064:0> csv.line
NoMethodError: undefined method 'line' for #<CSV:0x00007feeb2591020>
from (irb):64
irb(main):065:0> csv.shift
=> ["Santa", "Claus", "np.1"]
UPDATE:
The docs I was reading was for 2.6. I'm running ruby 2.4.5, but it looks like it was there then, as well: https://ruby-doc.com/stdlib-2.4.5/libdoc/csv/rdoc/CSV.html#method-i-line . Interestingly, .line
is not mentioned in https://docs.ruby-lang.org/en/2.4.0/CSV.html Hmm....
Also, I don't need the line number -- I need the raw line from the input file.
At this point, I'm about ready to just read the lines myself, then call CSV separately for each line. That will certainly work and put me in control...but I'm still confused why I can't call the .line()
method described in the docs. If anyone can see why I'm getting "undefined method 'line'", I'd surely appreciate it.