How to skip an unknown number of empty lines before header on pandas.read_csv?

Question

I want to read a dataframe from a csv file where the header is not in the first line. For example:

In [1]: import pandas as pd

In [2]: import io

In [3]: temp=u"""#Comment 1
   ...: #Comment 2
   ...: 
   ...: #The previous line is empty
   ...: Header1|Header2|Header3
   ...: 1|2|3
   ...: 4|5|6
   ...: 7|8|9"""

In [4]: df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", 
   ...:                  skiprows=4).dropna()

In [5]: df
Out[5]: 
   Header1  Header2  Header3
0        1        2        3
1        4        5        6
2        7        8        9

[3 rows x 3 columns]

The problem with the above code is that I don't now how many lines will exist before the header, therefore, I cannot use skiprows=4 as I did here.

I aware I can iterate through the file, as in the question Read pandas dataframe from csv beginning with non-fix header.

What I am looking for is a simpler solution, like making pandas.read_csv disregard any empty line and taking the first non-empty line as the header.

Sorry but doesn't `pd.read_csv(io.StringIO(temp), sep="|", comment="#")` work? — ayhan, Sep 02 '16 at 17:43
No, the comment lines are translated to empty lines, which are not disregarded. — bmello, Sep 02 '16 at 18:05
Can it be a version issue? `skip_blank_lines=True` is the default in 0.18.1. — ayhan, Sep 02 '16 at 18:06
And when I ran your example with `pd.read_csv(io.StringIO(temp), sep="|", comment="#")` it works exactly like you described in the desired output. — ayhan, Sep 02 '16 at 18:07

score 18 · Accepted Answer · answered Sep 02 '16 at 17:52

18

You need to set skip_blank_lines=True

df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", skip_blank_lines=True).dropna()

answered Sep 02 '16 at 17:52

ode2k

2,653
13
20

Thank you. I suspected such an option should exist, and even check the read_csv help. I didn't find because I am using pandas 0.13. I will try to update. – bmello Sep 02 '16 at 18:11

How to skip an unknown number of empty lines before header on pandas.read_csv?

1 Answers1