How to ignore the first line of data when processing CSV data?

Question

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?

This is the code so far:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.

Are you aware that you're just creating a generator that returns a `1.0` for each line in your file and then taking the minimum, which is going to be `1.0`? — Wooble, Jul 05 '12 at 17:24
@Wooble good catch - ...`datatype(row[column]`... is what I guess the OP is trying to achieve though — Jon Clements, Jul 05 '12 at 17:36
i had someone write up that code for me and didnt catch that, so thanks haha! — , Jul 05 '12 at 18:41

martineau · Accepted Answer · 2019-10-08T16:08:28.520

119

You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

print(least_value)

Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:

    data = (float(row[1]) for row in reader)

Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:

with open('all16.csv', 'rb') as file:

edited Oct 08 '19 at 16:08

answered Jul 05 '12 at 18:11

martineau

119,623
25
170
301

2

Instead of `has_header(file.read(1024))`, does it make sense to write `has_header(file.readline())` ? I see that a lot, but I don't understand how `has_reader()` could detect whether or not there's a header from a single line of the CSV file... – Anto Jan 12 '18 at 17:40
1

@Anto: The code in my answer is based on the "example for Sniffer use" in the [documentation](https://docs.python.org/3/library/csv.html#csv.Sniffer), so I assume it's the prescribed way to do it. I agree that doing it on the basis of one line of data doesn't seem like it would always be enough data to make such a determination—but I have no idea since _how_ the `Sniffer` works isn't described. FWIW I've **never** seen `has_header(file.readline())` being used and even if it worked most of time, I would be highly suspicious of the approach for the reasons stated. – martineau Jan 12 '18 at 18:40
Thanks for your input. Nevertheless it seems that using `file.read(1024)` [generates errors in python's csv lib](https://stackoverflow.com/a/35757505/1030960): . See also [here](https://github.com/xesscorp/KiField/issues/17#issuecomment-262871084) for instance. – Anto Jan 15 '18 at 19:58
@Anto: I've never encountered such an error—1024 bytes is not a lot of memory after all—nor has it been a problem for many other folks based on the up-votes this answer has received (as well as the thousands of of people who have read and followed the documentation). For those reasons I strongly suspect something else is causing your issue. – martineau Jan 15 '18 at 20:03
I ran into this exact same error as soon as I switched from `readline()` to `read(1024)`. So far I've only managed to find people who have switched to readline to solve the csv.dialect issue. – Anto Jan 15 '18 at 20:35
@Anto: In that case I suggest you post your own question about why this is happening. Be sure to include the least amount of code that, when run, will reproduce the problem. See [How to create a Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve). – martineau Jan 15 '18 at 21:14
How do I skip the first row if I want to use `csv.DictReader` instead of `csv.reader`? In my case the CSV starts with a row that explains to humans reading the sheet what each column means. – Boris Verkhovskiy Apr 19 '21 at 22:39
@Boris: If the file has more than one row/line at the beginning with the fieldnames in it — i.e. the optional so-called "header" row — then it's not in a valid CSV format. About the only thing I can think of is that you may be able to call `next(file)` right after opening it to skip over that if you know for certain it will be there. You will also need to rewind the file back to that point *plus* any standard normal header that was detected by the `csv.Sniffer` before passing it to `csv.reader`. – martineau Apr 20 '21 at 00:12

score 87 · Answer 2 · answered Jul 05 '12 at 18:15

87

To skip the first line just call:

next(inf)

Files in Python are iterators over lines.

answered Jul 05 '12 at 18:15

jfs

399,953
195
994
1,670

Nice summary of files in Python – bearcat Nov 29 '20 at 03:42
can you give a link where you found this? any link to the next() documentation where it mentions the parameters for the next function. – Bluetail Jan 29 '21 at 16:46
@bluetail https://docs.python.org/3/library/functions.html#next – jfs Jan 29 '21 at 18:21
1

If one of the values in the first row can contain a newline `\n` character, this won't work. – Boris Verkhovskiy Apr 19 '21 at 22:37

score 43 · Answer 3 · answered Mar 31 '18 at 11:02

43

Borrowed from python cookbook,
A more concise template code might look like this:

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f) 
    headers = next(f_csv) 
    for row in f_csv:
        # Process row ...

answered Mar 31 '18 at 11:02

shin

671
6
10

score 25 · Answer 4 · edited May 27 '15 at 14:40

25

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.

with open('all16.csv') as tmp:
    # Skip first line (if any)
    next(tmp, None)

    # {line_num: row}
    data = dict(enumerate(csv.DictReader(tmp)))

edited May 27 '15 at 14:40

Veedrac

58,273
15
112
169

answered Dec 18 '14 at 23:16

Maarten

1,491
2
12
8

Thanks Veedrac. Happy to learn here, can you suggest edits that would solve the problems you cite? My solution gets the job done, but it looks like it could be further improved? – Maarten May 27 '15 at 14:25
1

I gave you an edit that replaces the code with something that should be identical (untested). Feel free to revert if it's not in line with what you mean. I'm still not sure why you're making the `data` dictionary, nor does this answer really add anything over the accepted one. – Veedrac May 27 '15 at 14:42
Thanks Veedrac! That looks very efficient indeed. I posted my answer because the accepted one was not working for me (can't remember the reason now). What would be the problem with defining data = dict() and then immediately filling it (as compared to your suggestion)? – Maarten May 28 '15 at 18:33
1

It's not *wrong* to do `data = dict()` and fill it in, but it's inefficient and not idiomatic. Plus, one should use dict literals (`{}`) and `enumerate` even then. – Veedrac May 28 '15 at 19:46
1

FWIW, you should reply to my posts with `@Veedrac` if you want to be sure I'm notified, although Stack Overflow seems to be able to guess from the username along. (I don't write `@Maarten` because the answerer will be notified by default.) – Veedrac May 28 '15 at 19:46
If one of the values in the first row can contain a newline `\n` character, this won't work. – Boris Verkhovskiy Apr 19 '21 at 22:37

score 19 · Answer 5 · answered Jul 05 '12 at 17:26

19

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:

from itertools import islice
for row in islice(incsv, 30, None):
    # process

answered Jul 05 '12 at 17:26

Jon Clements

138,671
33
247
280

score 8 · Answer 6 · answered Jul 05 '12 at 17:53

8

use csv.DictReader instead of csv.Reader. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

answered Jul 05 '12 at 17:53

iruvar

22,736
7
53
82

score 5 · Answer 7 · answered Jul 26 '20 at 04:49

Python 2.x

csvreader.next()

Return the next row of the reader’s iterable object as a list, parsed according to the current dialect.

csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 3.x

csvreader.__next__()

Return the next row of the reader’s iterable object as a list (if the object was returned from reader()) or a dict (if it is a DictReader instance), parsed according to the current dialect. Usually you should call this as next(reader).

csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
    print(row) # should print second row

Docs say "Usually you should call this as `next(reader)`." https://docs.python.org/3/library/csv.html#csv.csvreader.__next__ — jrc, Dec 22 '22 at 21:31

score 4 · Answer 8 · answered Sep 16 '20 at 01:34

this might be a very old question but with pandas we have a very easy solution

import pandas as pd

data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()

with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

Lassi · Answer 9 · 2018-11-13T10:37:25.943

The documentation for the Python 3 CSV module provides this example:

with open('example.csv', newline='') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    # ... process CSV file contents here ...

The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:

if sniffer.has_header():
    for header_row in reader:
        break
for data_row in reader:
    # do something with the row

score 2 · Answer 10 · answered Aug 28 '14 at 15:43

2

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.

import pandas as pd

data = pd.read_csv('all16.csv')
data.min()

answered Aug 28 '14 at 15:43

Finn Årup Nielsen

6,130
1
33
43

and you can write it in one line too: `pd.read_csv('all16.csv').min()` – Finn Årup Nielsen Aug 28 '14 at 15:46

score 2 · Answer 11 · answered May 01 '18 at 18:06

Because this is related to something I was doing, I'll share here.

What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?

If your task is basic, such as printing or appending to a list or array, you could just use an if statement:

# Let's say there's 4 columns
with open('file.csv') as csvfile:
     csvreader = csv.reader(csvfile)
# read first line
     first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
     if len(first_line) == 4:
          array.append(first_line)
# Now we'll just iterate over everything else as usual:
     for row in csvreader:
          array.append(row)

score 1 · Answer 12 · answered Dec 01 '14 at 10:18

Well, my mini wrapper library would do the job as well.

>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])

Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:

>>> min(data.column["Column 1"])

score 1 · Answer 13 · answered Mar 12 '18 at 12:44

For me the easiest way to go is to use range.

import csv

with open('files/filename.csv') as I:
    reader = csv.reader(I)
    fulllist = list(reader)

# Starting with data skipping header
for item in range(1, len(fulllist)): 
    # Print each row using "item" as the index value
    print (fulllist[item])

score 1 · Answer 14 · answered Mar 27 '20 at 11:21

I would convert csvreader to list, then pop the first element

import csv        

with open(fileName, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        data = list(csvreader)               # Convert to list
        data.pop(0)                          # Removes the first row

        for row in data:
            print(row)

score 0 · Answer 15 · answered Sep 13 '15 at 10:26

0

I would use tail to get rid of the unwanted first line:

tail -n +2 $INFIL | whatever_script.py

answered Sep 13 '15 at 10:26

Karel Adams

185
2
19

score 0 · Answer 16 · edited Feb 29 '16 at 16:37

0

just add [1:]

example below:

data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**

that works for me in iPython

edited Feb 29 '16 at 16:37

OneCricketeer

179,855
19
132
245

answered Nov 01 '15 at 00:02

the curious mind

507
6
6

Christophe Roussy · Answer 17 · 2016-10-26T09:42:43.740

Python 3.X

Handles UTF8 BOM + HEADER

It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file). This works for me using only the csv module:

import csv

def read_csv(self, csv_path, delimiter):
    with open(csv_path, newline='', encoding='utf-8') as f:
        # https://bugs.python.org/issue7185
        # Remove UTF8 BOM.
        txt = f.read()[1:]

    # Remove header line.
    header = txt.splitlines()[:1]
    lines = txt.splitlines()[1:]

    # Convert to list.
    csv_rows = list(csv.reader(lines, delimiter=delimiter))

    for row in csv_rows:
        value = row[INDEX_HERE]

score 0 · Answer 18 · edited Jan 31 '22 at 23:32

0

Simple Solution is to use csv.DictReader()

import csv

def read_csv(file): with open(file, 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row["column_name"])  # Replace the name of column header.

edited Jan 31 '22 at 23:32

ddejohn

8,775
3
17
30

answered Dec 21 '21 at 11:40

Smaurya

167
9

How to ignore the first line of data when processing CSV data?

18 Answers18

Linked

Related