Melt/unpivot a dataset with multiple groups of values

Question

I'm trying to convert a dataframe in Python, and I am stuck because I don't know exactly how to phrase what I want to do (which makes searching for it difficult). It seems like I need a combination of unstack and pivot. I can explain it with an example though, hopefully. I have a dataframe with this shape:

userid	GroupA_measure1	GroupA_measure2	GroupB_measure1	GroupB_measure2
001	65	70	45	50
002	96	89	12	8
003	12	14	38	40

and I want to convert this to this format:

userid	measure	groupA	groupB
001	1	65	45
001	2	70	50
002	1	96	12
002	2	89	8
003	1	12	38
003	2	14	40

I can unstack the whole df using pd.melt(df, id_vars =['userid']) which put all values in separate rows, but I want to keep separate columns for the values of GroupA and GroupB.

Any help would be much appreciated.

jezrael · Accepted Answer · 2021-03-08T13:39:48.047

Use wide_to_long with extract numbers from measure column by Series.str.extract:

df1 = pd.wide_to_long(df, 
                      stubnames=['GroupA','GroupB'], 
                      i='userid', 
                      j='measure', sep='_', suffix=r'\w+').reset_index()

df1['measure'] = df1['measure'].str.extract('(\d+)').astype(int)

Or convert not _ columns first, split all columns with _ and reshape by DataFrame.stack, last also extract digits:

df1 = df.set_index('userid')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.rename_axis((None, 'measure'), axis=1).stack().reset_index()
df1['measure'] = df1['measure'].str.extract('(\d+)').astype(int)
print (df1)
  userid  measure GroupA GroupB
0    001        1     65     45
1    002        1     96     12
2    003        1     12     38
3    001        2     70     50
4    002        2     89      8
5    003        2     14     40

Last if necessary sorting by DataFrame.sort_values:

df1 = df1.sort_values('userid', ignore_index=True)
print (df1)
  userid  measure GroupA GroupB
0    001        1     65     45
1    001        2     70     50
2    002        1     96     12
3    002        2     89      8
4    003        1     12     38
5    003        2     14     40

score 0 · Answer 2 · answered Mar 21 '22 at 11:19

One option is with the pivot_longer function from pyjanitor, using the .value placeholder:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(index="userid", 
                names_to=(".value", "measure"), 
                names_pattern=r"(.+)_*(\d)"
               )

   userid measure  GroupA_measure  GroupB_measure
0     001       1              65              45
1     002       1              96              12
2     003       1              12              38
3     001       2              70              50
4     002       2              89               8
5     003       2              14              40

The names_pattern is a regex to split the columns. The .value keeps the part of the column associated with it as header, while the number extracted via the regex goes into the measure column.

Melt/unpivot a dataset with multiple groups of values

2 Answers2