3

I am trying to get an index or row number for the row that holds the headers in my CSV file. The issue is, the header row can move up and down depending on the output of the report from our system (I have no control to change this)

code:

ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files

If I was to print test, I would expect a number around 4 or 5, and that's what I am feeding into the second read "read_csv"

The error I'm getting is that it's expecting 1 header column, but I have 26 columns. I am just trying to use the first header string to get the row number

Thanks :-)

Edit:

CSV format

This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18

as you will see the "The deadlines" rows are the same, this can be 3 or 5 based on the code ids, thus the header row can change up or down.

I also did not write out all 26 column headers, not sure that matters.

Wanted DF format

index |    code         |   type   | arrived_date | est_del_date
1     | a/wrwgwr12/001  |   kids   |   12-dec-18  | 17-dec-18
2     | aa/gjghgj35/030 |  Pet     |  15-dec-18   | 18-dec-18

Hope this makes sense..

Thanks,

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
Runawaygeek
  • 115
  • 3
  • 13

1 Answers1

4

You can use the csv module to find the first row which contains a delimiter, then feed the index of this row as the skiprows parameter to pd.read_csv:

from io import StringIO
import csv
import pandas as pd

x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
    reader = csv.reader(fin)
    idx = next(idx for idx, row in enumerate(reader) if len(row) > 1)  # 4

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)

print(df)

              code  type arrived_date est_del_date
0   a/wrwgwr12/001  kids    12-dec-18    17-dec-18
1  aa/gjghgj35/030   pet    15-dec-18    18-dec-18
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thank you for this. So, I wrote this out as a test script, works perfectly. When I add it to my final script though, I get an error: "csvreader = csv.reader(fin) AttributeError: 'list' object has no attribute 'reader'" No idea why yet, but I don't think its the code. – Runawaygeek Dec 19 '18 at 10:45
  • resolved: I think something was confused by CSV, as i am using Pandas CSV and CSV to open the files.. so i changed "Import csv" to "Import csv as cf" and updated the code to be cf.reader(fin) and it all works. – Runawaygeek Dec 19 '18 at 11:01