28

Ruby's CSV class makes it pretty easy to iterate over each row:

CSV.foreach(file) { |row| puts row }

However, this always includes the header row, so I'll get as output:

header1, header2
foo, bar
baz, yak

I don't want the headers though. Now, when I call …

CSV.foreach(file, :headers => true)

I get this result:

#<CSV::Row:0x10112e510
    @header_row = false,
    attr_reader :row = [
        [0] [
            [0] "header1",
            [1] "foo"
        ],
        [1] [
            [0] "header2",
            [1] "bar"
        ]
    ]
>

Of course, because the documentation says:

This setting causes #shift to return rows as CSV::Row objects instead of Arrays

But, how can I skip the header row, returning the row as a simple array? I don't want the complicated CSV::Row object to be returned.

I definitely don't want to do this:

first = true
CSV.foreach(file) do |row|
  if first
    puts row
    first = false
  else
    # code for other rows
  end
end
slhck
  • 36,575
  • 28
  • 148
  • 201

3 Answers3

16

Look at #shift from CSV Class:

The primary read method for wrapped Strings and IOs, a single row is pulled from the data source, parsed and returned as an Array of fields (if header rows are not used)

An Example:

require 'csv'

# CSV FILE
# name, surname, location
# Mark, Needham, Sydney
# David, Smith, London

def parse_csv_file_for_names(path_to_csv)
  names = []  
  csv_contents = CSV.read(path_to_csv)
  csv_contents.shift
  csv_contents.each do |row|
    names << row[0]
  end
  return names
end
waldyr.ar
  • 14,424
  • 6
  • 33
  • 64
  • CSV.read returns ans Array and #shift is default methods for array. and correctfully very usefull here. – Pritesh Jain Jul 31 '12 at 13:53
  • You can also iterate using `each_with_index` and check which line index you're on. A `next if (i == 0)` would skip the first line for index `i`. – tadman Jul 31 '12 at 15:59
  • @tadman Feel free to post that as a separate answer – looks viable. – slhck Aug 01 '12 at 11:31
  • Is this more memory intensive because you're saving the entire file contents into a variable, as opposed to reading line by line when doing foreach? – ahnbizcad Aug 30 '16 at 18:42
16

You might want to consider CSV.parse(csv_file, { :headers => false }) and passing a block, as mentioned here

Community
  • 1
  • 1
jodell
  • 1,057
  • 9
  • 20
9

A cool way to ignore the headers is to read it as an array and ignore the first row:

data = CSV.read("dataset.csv")[1 .. -1]
# => [["first_row", "with data"],
      ["second_row", "and more data"],
      ...
      ["last_row", "finally"]]

The problem with the :headers => false approach is that CSV won't try to read the first row as a header, but will consider it part of the data. So, basically, you have a useless first row.

Carlos Agarie
  • 3,952
  • 1
  • 26
  • 38
  • Nice to see a few options, but keep in mind if your CSV file is large, this isn't very memory-friendly relative to the other answers. Also, it's exactly the same answer as [this](https://stackoverflow.com/a/20623072/6243352) -- best to post once. – ggorlen Oct 05 '21 at 23:51