I have some txt files and they start with a lot of BS and after 20 to 30 lines the useful part starts. I want to use the last line before the numbers as my header. I know If I know the exact line number, I can set that as my header (using pd.read_csv) but for each file, that number is different (as I said it's between 20 to 30). I know the line that I am looking for starts with "Potential". Is there any easy way to use pd.read_csv and set the header from the beginning.
Asked
Active
Viewed 200 times
2
-
Possible duplicate of [Convert row to column header for Pandas DataFrame,](https://stackoverflow.com/questions/26147180/convert-row-to-column-header-for-pandas-dataframe) – Edeki Okoh May 06 '19 at 19:14
-
@EdekiOkoh Does not look like a dupe of that (but still can be a dupe of something else). – DYZ May 06 '19 at 19:16
-
Any chance that all of the header lines start with some specific single character? – ALollz May 06 '19 at 19:39
1 Answers
5
You can read the top of the file using "traditional" file I/O methods and count the rows until you find the header row. Once you know its number, reread the file with pandas.read_csv()
.
with open(yourfile) as infile:
for n,row in enumerate(infile):
if row.startswith("Potential"):
break
df = pd.read_csv(yourfile, skiprows=n)

DYZ
- 55,249
- 10
- 64
- 93
-
2That's a very cool method! Will defo be using it, my thanks to you sir. – Umar.H May 06 '19 at 19:30
-
1@piRSquared I would of course accept it if I was the question asker, , in this case I'm not! – Umar.H May 06 '19 at 20:21
-
2