60

I'm trying to work out how to get the current line/row number from Ruby CSV. This is my code:

options = {:encoding => 'UTF-8', :skip_blanks => true}
CSV.foreach("data.csv", options, ) do |row, i|
   puts i
end

But this doesn't seem to work as expected. Is there a way to do this?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
user1513388
  • 7,165
  • 14
  • 69
  • 111

4 Answers4

139

Because of changes in CSV in current Rubies, we need to make some changes. See farther down in the answer for the original solution with Ruby prior to 2.6. and the use of with_index which continues to work regardless of the version.

For 2.6+ this'll work:

require 'csv'

puts RUBY_VERSION

csv_file = CSV.open('test.csv')
csv_file.each do |csv_row|
  puts '%i %s' % [csv_file.lineno, csv_row]
end
csv_file.close

If I read:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.00

The code results in this output:

2.6.3
1 ["Year", "Make", "Model", "Description", "Price"]
2 ["1997", "Ford", "E350", "ac, abs, moon", "3000.00"]
3 ["1999", "Chevy", "Venture \"Extended Edition\"", "", "4900.00"]
4 ["1999", "Chevy", "Venture \"Extended Edition, Very Large\"", "", "5000.00"]
5 ["1996", "Jeep", "Grand Cherokee", "MUST SELL!\\nair, moon roof, loaded", "4799.00"]

The change is because we have to get access to the current file handle. Previously we could use the global $., which always had a possibility of failure because globals can get stomped on by other sections of called code. If we have the handle of the file being opened, then we can use lineno without that concern.


$.

Ruby prior to 2.6 would let us do this:

Ruby has a magic variable $. which is the line number of the current file being read:

require 'csv'

CSV.foreach('test.csv') do |csv|
  puts $.
end

with the code above, I get:

1
2
3
4
5

$INPUT_LINE_NUMBER

$. is used all the time in Perl. In Ruby, it's recommended we use it the following way to avoid the "magical" side of it:

require 'english'

puts $INPUT_LINE_NUMBER

If it's necessary to deal with embedded line-ends in fields, it's easily handled by a minor modification. Assuming a CSV file "test.csv" which contains a line with an embedded new-line:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00

with_index

Using Enumerator's with_index(1) makes it easy to keep track of the number of times CSV yields to the block, effectively simulating using $. but honoring CSV's work when reading the extra lines necessary to deal with the line-ends:

require 'csv'

CSV.foreach('test.csv', headers: true).with_index(1) do |row, ln|
  puts '%-3d %-5s %-26s %s' % [ln, *row.values_at('Make', 'Model', 'Description')]
end

Which, when run, outputs:

$ ruby test.rb
1   Ford  E350                       ac, abs, moon
2   Chevy Venture "Extended Edition"
3   Jeep  Grand Cherokee             MUST SELL!
air, moon roof, loaded
4   Chevy Venture "Extended Edition, Very Large"
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • 2
    this had unexpected results in rspec – James Sep 19 '13 at 17:15
  • This wasn't written to work in rspec. It works in regular Ruby. – the Tin Man Sep 19 '13 at 19:10
  • Any idea what happens when you use the `headers: true` option. Is the first line still `1` or is it `2`? – Joshua Pinter Oct 25 '13 at 06:19
  • 3
    FYI, if you use the `headers: true` option, the first *row* returns `2`. – Joshua Pinter Oct 25 '13 at 06:52
  • If you use `headers: true` it returns `2` because it's reading the second row. Why wouldn't it? The first row was read by CSV as a header. Think about it. – the Tin Man Oct 20 '15 at 01:39
  • Rails is Ruby so it works, so you are doing something wrong. If it's important to your code I'd suggest asking a question. – the Tin Man Oct 20 '15 at 01:40
  • Don't use this. Use each_with_index as described by Josh Voigts below. Requiring "english" does not make it less magic, it just gives a name to it. – Pascal Dec 14 '15 at 15:48
  • 4
    Rather than say "don't use this", explain why in a separate answer. Help educate. – the Tin Man Dec 14 '15 at 20:32
  • 3
    The problem with using `$.` is it's the **line number**, which does not always reflect the **row number** since a cell can contain multiple lines. – sshaw Mar 29 '16 at 21:43
  • Yes, that's possible, but it's an easy work-around. See the code. – the Tin Man Mar 29 '16 at 22:45
  • `$.` and `$INPUT_LINE_NUMBER` worked for us with Ruby 2.5.0 but not after upgrading to 2.6.2, it just keeps returning 1. Had to use `.with_index` instead. – kaydanzie Mar 25 '19 at 20:08
  • Yes, `$.` and `$INPUT_LINE_NUMBER` are globals. See the added information in the answer. `with_index` works, or using `FILE.lineno`. It's always possible that embedded carriage returns in the CSV file or certain options to CSV's reading will confuse tracking line numbers, at which point it becomes an exercise for the developer to figure out because Ruby won't be able to read their mind. – the Tin Man May 30 '19 at 20:22
36

Here's an alternative solution:

options = {:encoding => 'UTF-8', :skip_blanks => true}

CSV.foreach("data.csv", options).with_index do |row, i|
   puts i
end
Josh Voigts
  • 4,114
  • 1
  • 18
  • 43
  • 5
    Yes, this works and is clean. But it requires reading the whole CVS content into RAM at once. This is what `read` does. – undur_gongor Sep 13 '12 at 18:57
  • 1
    According to the CSV documentation, `read` actually _slurps_ the data in. [http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html](http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-read) – Josh Voigts Sep 13 '12 at 19:06
  • 6
    You want to say, that "slurp" somehow means lazy reading? I doub't that. The result of the `read` is an `Array` and as far as I know there is no way to have deferred array building in Ruby. – undur_gongor Sep 13 '12 at 19:14
  • Indeed to *slurp* usually means *to read all at once* on this context. – sschuberth Nov 20 '15 at 09:52
  • 2
    Instead of `each_with_index`, you should use `with_index`. `each_with_index` would require Ruby to load the entire file. – the Tin Man Mar 29 '16 at 22:46
  • 2
    Updated example to use `with_index` instead. – Josh Voigts Apr 01 '16 at 03:10
6

Not a clean but a simple solution

options = {:encoding => 'UTF-8', :skip_blanks => true}
i = 0
CSV.foreach("data.csv", options) do | row |
  puts i
  i += 1
end
undur_gongor
  • 15,657
  • 5
  • 63
  • 75
  • As far as I know this is the only way to get the row number, not the line number **but**, you should offset `i` by 1 if `:headers => true`. – sshaw Mar 29 '16 at 21:51
  • 2
    Using `with_index` would be a cleaner, more Ruby-like, solution. – the Tin Man Mar 29 '16 at 22:45
5

Ruby 2.6+

Without Headers

CSV.foreach( "data.csv", encoding: "UTF-8" ).with_index do |row, row_number|
  puts row_number
end

With Headers

CSV.foreach( "data.csv", encoding: "UTF-8", headers: true ).with_index( 2 ) do |row, row_number|
  puts row_number # Starts at row 2, which is the first row after the header row.
end

In Ruby 2.6, $INPUT_LINE_NUMBER no longer gives you the current line number. What's worse is that it's returning values of 2 and 1. I'm not sure what that is supposed to represent but it's certainly not the row number. Since it doesn't raise an exception, it can really bite you if you're not checking that value. I highly recommend you replace all occurrences of $INPUT_LINE_NUMBER in your code to avoid this gotcha.

Joshua Pinter
  • 45,245
  • 23
  • 243
  • 245