1

I need to transform df1 to df2:

import pandas as pd
from pandas import DataFrame, Series

import numpy as np

df1 = pd.DataFrame(index=['date_1', 'date_2', 'date_3'], 
              columns=["A_count", "A_dollar", "B_count", "B_dollar"], 
              data=[[10,"$100",7,"$786"], [3,"$43",6,"$88"],     [5,"$565",8,"$876"]])
df1

enter image description here

Basically what I need is put the items (A and B) as labels in a new column, then move the 3rd and 4th columns data each row under A items. That will give us a new row for each date.

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356
DanZimmerman
  • 1,626
  • 6
  • 23
  • 45

1 Answers1

3

You can convert the columns to multi-index by splitting it with underscore, then reshape it to long format with stack:

df1.columns = df1.columns.str.split("_", expand=True)
df1.stack(level=0).rename_axis((None, "item")).reset_index("item")

enter image description here

If you have multiple underscores in the column names as follows:

df1 = pd.DataFrame(index=['date_1', 'date_2', 'date_3'], 
              columns=["A_x_count", "A_x_dollar", "B_y_count", "B_y_dollar"], 
              data=[[10,"$100",7,"$786"], [3,"$43",6,"$88"],     [5,"$565",8,"$876"]])
df1

enter image description here

You can use rsplit with n = 1 so that it will only split on the last underscore:

df1.columns = df1.columns.str.rsplit("_", n=1, expand=True)
df1.stack(level=0).rename_axis((None, "item")).reset_index("item")

enter image description here

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks for the quick response. What if the original column names already have "_" and I only want to split the names by the last "_"? – DanZimmerman Apr 04 '17 at 15:53
  • What do you mean `""`? Empty string? – Psidom Apr 04 '17 at 15:56
  • Sorry, for some reason the underscore did not show up. "underscore", "_" – DanZimmerman Apr 04 '17 at 16:23
  • If there are multiple underscores in the column names, and you want to split on the last one, you can use `rsplit`(right split) with maximum of split of `1`. so `df1.columns = df1.columns.str.rsplit("_", n=1, expand=True)` should work. – Psidom Apr 04 '17 at 16:50
  • Thanks Psidom, I really appreciate this. Let me try your solution on my bigger data set and I will accept the answer once I'm done with my test. Have a good one :) – DanZimmerman Apr 04 '17 at 16:53
  • Hey Psidom, why are you such a genius? Your solution worked perfectly on my larger data set and I've accepted your answer. Thanks! – DanZimmerman Apr 04 '17 at 18:15