2

I have some txt files and they start with a lot of BS and after 20 to 30 lines the useful part starts. I want to use the last line before the numbers as my header. I know If I know the exact line number, I can set that as my header (using pd.read_csv) but for each file, that number is different (as I said it's between 20 to 30). I know the line that I am looking for starts with "Potential". Is there any easy way to use pd.read_csv and set the header from the beginning.

glibdud
  • 7,550
  • 4
  • 27
  • 37
  • Possible duplicate of [Convert row to column header for Pandas DataFrame,](https://stackoverflow.com/questions/26147180/convert-row-to-column-header-for-pandas-dataframe) – Edeki Okoh May 06 '19 at 19:14
  • @EdekiOkoh Does not look like a dupe of that (but still can be a dupe of something else). – DYZ May 06 '19 at 19:16
  • Any chance that all of the header lines start with some specific single character? – ALollz May 06 '19 at 19:39

1 Answers1

5

You can read the top of the file using "traditional" file I/O methods and count the rows until you find the header row. Once you know its number, reread the file with pandas.read_csv().

with open(yourfile) as infile:
    for n,row in enumerate(infile):
        if row.startswith("Potential"):
            break

df = pd.read_csv(yourfile, skiprows=n)
DYZ
  • 55,249
  • 10
  • 64
  • 93