1

I have data like this:

ID        Cue       trial     time     accuracy
A         apple     copy      1450     1 
A         dog       copy      2154     1
A         apple     test1     2121     0
A         dog       test2     0        1
A         apple     final     1231     0
A         dog       final     5411     1
B         apple     copy      818      0
B         ...       ...       ..      ...

And I need to transform it so that each ID and Cue combination is all on one line:

  A       apple   apple    apple   copy  test1   final   1450  2121  1231  1   0   0
  ...

In R, I can do it easily with data.table using dcast like this:

 dcast(data, ID + Cue ~ rowid(ID, Cue), value.var=c("time", "accuracy"))

But in Pandas I have trouble. If I use pivot_table, it does not come out right. For instance, if I do this:

 data.pivot_table(index=['ID', 'Cue', 'TrialType'], values=['time', 'accuracy'])

It always comes out like this:

 A      apple     copy     1450    1
                  final    1231    0
                  test1    2121    0

The problem is that I need the "trial" column to not be stacked, but pivot_table / crosstab do not seem capable of handling this. Be aware that if I put the "columns='trial'" in the pivot_table it will not work for my purposes because the 'trial' column is used as a categorical variable in my analysis.

Any hints are appreciated.

Kalif Vaughn
  • 353
  • 2
  • 5
  • 10

1 Answers1

1

Copy column Cue to new Cue1 for second position by DataFrame.insert, unpivot by DataFrame.melt, add counter by GroupBy.cumcount and reshape by DataFrame.set_index with Series.unstack:

df.insert(1, 'Cue1', df['Cue'])
df = df.melt(['ID','Cue'])
df['g'] = df.groupby(['ID','Cue']).cumcount()
df = df.set_index(['ID', 'Cue', 'g'])['value'].unstack().reset_index(level=1, drop=True)
print (df)

g      0      1      2     3      4      5     6     7     8    9    10   11
ID                                                                          
A   apple  apple  apple  copy  test1  final  1450  2121  1231    1    0    0
A     dog    dog    dog  copy  test2  final  2154     0  5411    1    1    1
B   apple   copy    818     0    NaN    NaN   NaN   NaN   NaN  NaN  NaN  NaN

Another idea:

c = df.columns.difference(['ID'], sort=False)
df = (df.groupby(['ID','Cue'])[c]
        .apply(lambda x: pd.DataFrame([x.T.values.ravel()]))
        .reset_index(level=[1,2], drop=True))
print (df)
        0      1      2     3      4      5       6       7       8    9   10  \
ID                                                                              
A   apple  apple  apple  copy  test1  final  1450.0  2121.0  1231.0  1.0  0.0   
A     dog    dog    dog  copy  test2  final  2154.0     0.0  5411.0  1.0  1.0   
B   apple   copy    818     0    NaN    NaN     NaN     NaN     NaN  NaN  NaN   

     11  
ID       
A   0.0  
A   1.0  
B   NaN  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252