Python Pandas Groupby/Append columns

Question

This is my example dataframe:

    Index  Param1 Param2
    A      1      2
    A      3      4   
    B      1      3
    B      4      Nan
    C      2      4

What I would like to get is:

    Index  Param1 Param2 Param3 Param4
    A      1      2      3      4
    B      1      3      4
    C      2      4

What would be the best way to achieve it using pandas? Thanks in advance for your help.

jezrael · Accepted Answer · 2016-11-24T15:17:56.163

1

You can use groupby with unstack:

def f(x):
    return (pd.DataFrame(np.sort(x.values.ravel())))

df = df.groupby('Index')['Param1','Param2'].apply(f).unstack()
df.columns = df.columns.droplevel(0)
print (df)
       0  1     2     3
Index                  
A      1  2     3     4
B      1  3     4   Nan
C      2  4  None  None

because if use Series get:

TypeError: Series.name must be a hashable type

Another solution with cumcount:

df = df.set_index('Index').stack().reset_index(name='vals')
df['g'] = 'Param' + df.groupby('Index').cumcount().add(1).astype(str)
df = df.pivot(index='Index', columns='g', values='vals')
print (df)
g      Param1  Param2  Param3  Param4
Index                                
A         1.0     2.0     3.0     4.0
B         1.0     3.0     4.0     NaN
C         2.0     4.0     NaN     NaN

edited Nov 24 '16 at 15:17

answered Nov 24 '16 at 14:05

jezrael

822,522
95
1,334
1,252

Thank you! Both methods generally work, however in my real data resulting dataframe has a lot of NaN's in the middle. Is there a way to kind of "move them to the left"? Example row A 1 Nan 2 3 Nan 4, change into A 1 2 3 4 Nan Nan – Greg Nov 24 '16 at 15:15
Yes, you can add `np.sort` to `return (pd.DataFrame(np.sort(x.values.ravel())))` – jezrael Nov 24 '16 at 15:18

score 1 · Answer 2 · answered Nov 24 '16 at 14:05

1

import numpy as np
import pandas as pd

df = pd.DataFrame({'Index': ['A', 'A', 'B', 'B', 'C'], 'Param1': [1, 3, 1, 4, 2],
                   'Param2': [2, 4, 3, np.nan, 4]}).set_index('Index')
print(df)

#        Param1  Param2
# Index                
# A           1     2.0
# A           3     4.0
# B           1     3.0
# B           4     NaN
# C           2     4.0

def fn(g):
    return pd.Series(g.values.ravel())

res = df.groupby(df.index).apply(fn).unstack()
res.columns = ['Param1', 'Param2', 'Param3', 'Param4']
print(res)

#        Param1  Param2  Param3  Param4
# Index                                
# A         1.0     2.0     3.0     4.0
# B         1.0     3.0     4.0     NaN
# C         2.0     4.0     NaN     NaN

answered Nov 24 '16 at 14:05

Alicia Garcia-Raboso

13,193
1
43
48

Hmmm, what is your pandas version? For me it return `TypeError: Series.name must be a hashable type` – jezrael Nov 24 '16 at 14:06
@jezrael: I'm on 0.19.0 – Alicia Garcia-Raboso Nov 24 '16 at 14:06
I use `0.19.1` and I think it is bug – jezrael Nov 24 '16 at 14:07
Interesting, uses python 3 in win 7? – jezrael Nov 24 '16 at 14:11
Python 3, but on a Mac. – Alicia Garcia-Raboso Nov 24 '16 at 14:11

Python Pandas Groupby/Append columns

2 Answers2

Linked