How to make two rows in a pandas dataframe into column headers

Question

I have seen how to work with a double index, but I have not seen how to work with a two-row column headers. Is this possible?

For example, row 1 is a repetitive series of dates: 2016, 2016, 2015, 2015

Row 2 is a repetitive series of data. Dollar Sales, Unit Sales, Dollar Sales, Unit Sales.

So each "Dollar Sales" heading is actually tied to the date in the row above.

Subsequent rows are individual items with data.

Is there a way to do a groupby or some way that I can have two column headers? Ultimately, I want to line up the "Dollar Sales" as a series by date so that I can make a nice graph. Unfortunately there are multiple columns before the next "Dollar Sales" value. (More than just the one "Unit Sales" column). Also if I delete the date row above, there is no link between which "Dollar Sales" are tied to each date.

You can use a `pandas.MultiIndex` as header. See for example [this](https://stackoverflow.com/questions/37369317/how-do-i-change-or-access-pandas-multiindex-column-headers) and [this](https://stackoverflow.com/questions/23793463/pandas-multiindexing-column-headers). — Midnighter, Dec 06 '16 at 21:57
This works, thank you. I was unaware that MultiIndex would also apply to column headers... still learning pandas. — Stephen, Dec 08 '16 at 21:35

Kevin · Answer 1 · 2019-11-20T12:47:12.220

36

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns:

df = pandas.read_csv('DollarUnitSales.csv', header=[0,1])

You can also use more than two rows, or non-consecutive rows, to specify the column headers:

df = pandas.read_table('DataSheet1.csv', header=[0,2,3])

edited Nov 20 '19 at 12:47

answered Aug 08 '18 at 12:00

Kevin

16,549
8
60
74

How will it look like? Can you add an example? – PV8 Dec 11 '19 at 14:11
1

Awesome solution squareskittles!! This is exactly what I was looking for...for the past several days!! Thanks so much!! – ASH Jan 15 '21 at 00:02

score 0 · Answer 2 · answered Mar 27 '23 at 01:32

A MultiIndex can be created from rows and assigned as the new column labels.

For example, to make the following transformation, use pd.MultiIndex.from_frame().

df = pd.DataFrame([[2016, 2016, 2015, 2015], 
                   ['Dollar Sales', 'Unit Sales', 'Dollar Sales', 'Unit Sales'], 
                   [1, 2, 3, 4], [5, 6, 7, 8]], columns=[*'ABCD'])

new_labels = pd.MultiIndex.from_frame(df.iloc[:2].T.astype(str), names=['Year', 'Sales'])
df1 = df.set_axis(new_labels, axis=1).iloc[2:]

A MultiIndex can also be created from the old column labels and a dataframe row. For example, to make the following transformation, use pd.MultiIndex.from_arrays().

df = pd.DataFrame([['Dollar Sales', 'Unit Sales', 'Dollar Sales', 'Unit Sales'], 
                   [1, 2, 3, 4], [5, 6, 7, 8]], columns=[2016, 2016, 2015, 2015])

new_labels = pd.MultiIndex.from_arrays([df.columns, df.iloc[0]], names=['Year', 'Sales'])
df1 = df.set_axis(new_labels, axis=1).iloc[1:]

N.B. An important thing to note is the dtypes of the dataframe could be not ideal for the data it holds; astype(int) etc. could be necessary in the end.

Also, reset_index(drop=True) may be needed if the index should be reset.

How to make two rows in a pandas dataframe into column headers

2 Answers2

Linked

Related