Python - Restructure Dataframe, move column names to rows, reshape dataframe

Question

I need to transform df1 to df2:

import pandas as pd
from pandas import DataFrame, Series

import numpy as np

df1 = pd.DataFrame(index=['date_1', 'date_2', 'date_3'], 
              columns=["A_count", "A_dollar", "B_count", "B_dollar"], 
              data=[[10,"$100",7,"$786"], [3,"$43",6,"$88"],     [5,"$565",8,"$876"]])
df1

Basically what I need is put the items (A and B) as labels in a new column, then move the 3rd and 4th columns data each row under A items. That will give us a new row for each date.

Psidom · Accepted Answer · 2017-04-04T16:53:43.560

3

You can convert the columns to multi-index by splitting it with underscore, then reshape it to long format with stack:

df1.columns = df1.columns.str.split("_", expand=True)
df1.stack(level=0).rename_axis((None, "item")).reset_index("item")

If you have multiple underscores in the column names as follows:

df1 = pd.DataFrame(index=['date_1', 'date_2', 'date_3'], 
              columns=["A_x_count", "A_x_dollar", "B_y_count", "B_y_dollar"], 
              data=[[10,"$100",7,"$786"], [3,"$43",6,"$88"],     [5,"$565",8,"$876"]])
df1

You can use rsplit with n = 1 so that it will only split on the last underscore:

df1.columns = df1.columns.str.rsplit("_", n=1, expand=True)
df1.stack(level=0).rename_axis((None, "item")).reset_index("item")

edited Apr 04 '17 at 16:53

answered Apr 04 '17 at 15:01

Psidom

209,562
33
339
356

Thanks for the quick response. What if the original column names already have "_" and I only want to split the names by the last "_"? – DanZimmerman Apr 04 '17 at 15:53
What do you mean `""`? Empty string? – Psidom Apr 04 '17 at 15:56
Sorry, for some reason the underscore did not show up. "underscore", "_" – DanZimmerman Apr 04 '17 at 16:23
If there are multiple underscores in the column names, and you want to split on the last one, you can use `rsplit`(right split) with maximum of split of `1`. so `df1.columns = df1.columns.str.rsplit("_", n=1, expand=True)` should work. – Psidom Apr 04 '17 at 16:50
Thanks Psidom, I really appreciate this. Let me try your solution on my bigger data set and I will accept the answer once I'm done with my test. Have a good one :) – DanZimmerman Apr 04 '17 at 16:53
Hey Psidom, why are you such a genius? Your solution worked perfectly on my larger data set and I've accepted your answer. Thanks! – DanZimmerman Apr 04 '17 at 18:15

Python - Restructure Dataframe, move column names to rows, reshape dataframe

1 Answers1