29

I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible?

For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015

Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales.

So each "Dollar Sales" heading is actually tied to the date in the row above.

Subsequent rows are individual items with data.

Is there a way to do a groupby or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.

cottontail
  • 10,268
  • 18
  • 50
  • 51
Stephen
  • 437
  • 1
  • 5
  • 11
  • 1
    You can use a `pandas.MultiIndex` as header. See for example [this](https://stackoverflow.com/questions/37369317/how-do-i-change-or-access-pandas-multiindex-column-headers) and [this](https://stackoverflow.com/questions/23793463/pandas-multiindexing-column-headers). – Midnighter Dec 06 '16 at 21:57
  • This works, thank you. I was unaware that MultiIndex would also apply to column headers... still learning pandas. – Stephen Dec 08 '16 at 21:35

2 Answers2

36

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns:

df = pandas.read_csv('DollarUnitSales.csv', header=[0,1])

You can also use more than two rows, or non-consecutive rows, to specify the column headers:

df = pandas.read_table('DataSheet1.csv', header=[0,2,3])
Kevin
  • 16,549
  • 8
  • 60
  • 74
  • How will it look like? Can you add an example? – PV8 Dec 11 '19 at 14:11
  • 1
    Awesome solution squareskittles!! This is exactly what I was looking for...for the past several days!! Thanks so much!! – ASH Jan 15 '21 at 00:02
0

A MultiIndex can be created from rows and assigned as the new column labels.

For example, to make the following transformation, use pd.MultiIndex.from_frame().

result

df = pd.DataFrame([[2016, 2016, 2015, 2015], 
                   ['Dollar Sales', 'Unit Sales', 'Dollar Sales', 'Unit Sales'], 
                   [1, 2, 3, 4], [5, 6, 7, 8]], columns=[*'ABCD'])

new_labels = pd.MultiIndex.from_frame(df.iloc[:2].T.astype(str), names=['Year', 'Sales'])
df1 = df.set_axis(new_labels, axis=1).iloc[2:]

A MultiIndex can also be created from the old column labels and a dataframe row. For example, to make the following transformation, use pd.MultiIndex.from_arrays().

result2

df = pd.DataFrame([['Dollar Sales', 'Unit Sales', 'Dollar Sales', 'Unit Sales'], 
                   [1, 2, 3, 4], [5, 6, 7, 8]], columns=[2016, 2016, 2015, 2015])

new_labels = pd.MultiIndex.from_arrays([df.columns, df.iloc[0]], names=['Year', 'Sales'])
df1 = df.set_axis(new_labels, axis=1).iloc[1:]

N.B. An important thing to note is the dtypes of the dataframe could be not ideal for the data it holds; astype(int) etc. could be necessary in the end.

Also, reset_index(drop=True) may be needed if the index should be reset.

cottontail
  • 10,268
  • 18
  • 50
  • 51