1

I am trying to convert a dataframe of structure:

ID ID2 ID3 R_u L_u R_sd L_sd
1  F   G1   x   y   z    t
2  M   G2   x   y   z    t

into

ID ID2 ID3 Side u sd
1  F   G1   R    x z
1  F   G1   L    y t 
2  M   G2   R    x z
2  M   G2   L    y t 

I used pandas.melt function

df_melt = df(id_vars=[('ID')], value_vars=['R_u', 'L_u'], 
                                  var_name='Side', value_name = 'u')

but I couldn't find a way for more than four or six number of columns simultaneously. I guess I can start with melt and then feed each row using lambda but I feel like I can do this automatically.

Any possible solution, please?

petezurich
  • 9,280
  • 9
  • 43
  • 57
SAM
  • 55
  • 1
  • 7

4 Answers4

2

Set the ID column as index then split and expand the remaining columns to convert to multiindex then stack the level 0 of multiindex to reshape

s = df.set_index('ID')
s.columns = s.columns.str.split('_', expand=True)
s = s.stack(0)

Alternative approach with wide_to_long

pd.wide_to_long(df, i='ID', j='Side', stubnames=['R', 'L'], sep='_', suffix='\w+').stack().unstack(-2)

Result

     sd  u
ID        
1  L  t  y
   R  z  x
2  L  t  y
   R  z  x

Another approach with wide_to_long, which might be a bit longer:

temp = df.set_index(['ID','ID2','ID3'])
temp.columns = temp.columns.str.split("_").str[::-1].str.join("_")
(pd
.wide_to_long(
    temp.reset_index(), 
    i= temp.index.names, 
    j = 'Side', 
    stubnames = ['u', 'sd'], 
    sep='_', 
    suffix='.+')
.reset_index()
)

   ID ID2 ID3 Side  u sd
0   1   F  G1    R  x  z
1   1   F  G1    L  y  t
2   2   M  G2    R  x  z
3   2   M  G2    L  y  t
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • Ah this is a great solution but doesn't work if I have multiple ID columns -- I will edit my question – SAM Feb 05 '23 at 08:27
  • If you have multiple id columns then you can set the index accordingly.. for e.g.`df.set_index(['col1', 'col2', ...])` – Shubham Sharma Feb 05 '23 at 08:31
2

Good use case for janitor's pivot_longer:

# pip install janitor
import janitor

out = df.pivot_longer(index="ID*", names_to=['Side', '.value'], names_sep='_',
                      sort_by_appearance=True # optional
                      )

Output:

   ID ID2 ID3 Side  u sd
0   1   F  G1    R  x  z
1   1   F  G1    L  y  t
2   2   M  G2    R  x  z
3   2   M  G2    L  y  t

With pure pandas, using reshaping and a MultiIndex:

cols = list(df.filter(like='ID'))

out = (df.set_index(cols)
         .pipe(lambda d: d.set_axis(d.columns.str.split('_', expand=True), axis=1))
         .rename_axis(columns=('Side', None))
         .stack(0).reset_index()
      )

Output:

   ID ID2 ID3 Side sd  u
0   1   F  G1    L  t  y
1   1   F  G1    R  z  x
2   2   M  G2    L  t  y
3   2   M  G2    R  z  x
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    @sammywemmy you always amaze me with `janitor`, this possibility wasn't clearly described in the docs. Thanks! :) – mozway Feb 05 '23 at 11:06
  • 1
    yea, `pivot_longer` has a number of features and conveniences ... not sure how to balance adding examples and not making it bloated ... open to PRs to help with that :) – sammywemmy Feb 05 '23 at 11:22
  • Interesting, never used janitor before. It’s such a shame because melt is so easy to use – SAM Feb 05 '23 at 14:24
1

use pandasql its simple:

#%%
df1.sql("""
    select ID,ID2,ID3,'R' as Side,R_u as u,R_sd as sd from self
    union all select ID,ID2,ID3,'L' as Side,L_u as u,L_sd as sd from self
""").sort_values("ID")

out:

 ID ID2 ID3 Side  u sd
0   1   F  G1    R  x  z
2   1   F  G1    L  y  t
1   2   M  G2    R  x  z
3   2   M  G2    L  y  t

or pandas:

df11=df1.set_index(["ID","ID2","ID3"]).stack().reset_index(level=3)
df11.level_3.str.split("_",expand=True).set_axis(["Side","col1"],axis=1).assign(col2=df11.iloc[:,1]).reset_index().pivot(["ID","ID2","ID3","Side"],"col1","col2").reset_index(level=[1,2,3])

from pandasql import sqldf
sqldf("""
    select ID,ID2,ID3,'R' as Side,R_u as u,R_sd as sd from df1
    union all select ID,ID2,ID3,'L' as Side,L_u as u,L_sd as sd from df1
""",globals())

enter image description here

G.G
  • 639
  • 1
  • 5
  • Your `pandasql` approach gives this error: `'DataFrame' object has no attribute 'sql'`. – PaulS Feb 06 '23 at 10:35
  • The error is reported because I modified the pandas source code to encapsulate the sql method, if you need the original usage, please refer to the end of the answer @PaulS – G.G Feb 06 '23 at 11:24
  • Thanks for your answer, @G.G., but still getting errors: `ObjectNotExecutableError: Not an executable object: "\n select ID,ID2,ID3,'R' as Side,R_u as u,R_sd as sd from df\n union all select ID,ID2,ID3,'L' as Side,L_u as u,L_sd as sd from df\n"`. – PaulS Feb 06 '23 at 11:31
  • it works on my pycharm ,look the picture. maybe the reason is " , you can use """ """ – G.G Feb 06 '23 at 11:40
  • Thanks again, @G.G. For some reason, every example I use with `sqldf` gives an error. Therefore, I think the problem is on my side. – PaulS Feb 06 '23 at 12:00
0

A possible solution:

import re 

(df.filter(like='ID')
 .join(pd.concat(
     [df.filter(like=x).rename(lambda c: re.sub(r'^.*_', '', c), axis=1)
      .assign(Side = x) for x in ['R', 'L']])))

Output:

   ID ID2 ID3  u sd Side
0   1   F  G1  x  z    R
0   1   F  G1  y  t    L
1   2   M  G2  x  z    R
1   2   M  G2  y  t    L

Another possible solution, based on pandas.melt + pandas.DataFrame.pivot:

ids = ['ID', 'ID2', 'ID3']
d = df.melt(ids)
(d.join(d['variable'].str.extract('(.*)_(.*)'))
 .rename({0: 'Side'}, axis=1)
 .pivot(index = ids + ['Side'], columns=[1], values='value')
 .reset_index().rename_axis(None, axis=1))

Output:

   ID ID2 ID3 Side sd  u
0   1   F  G1    L  t  y
1   1   F  G1    R  z  x
2   2   M  G2    L  t  y
3   2   M  G2    R  z  x
PaulS
  • 21,159
  • 2
  • 9
  • 26