1

I have a pandas dataframe with a multi-index where several groupby aggregations have already been computed. I'm trying to pivot the final grouped column into headings, without any aggregation. I've managed to accomplish it with aggregation, but can't seem to figure out how to simply transpose it so the 2nd last column becomes headers and the last column becomes the values under each header.

A similar example:

df = pd.DataFrame.from_dict(
   {'Classifier':['SVM']*6 + ['CNN']*6,
    'Preprocess': (['None']*3 + ['PCA']*3)*2,
    'Group': ([1]*2 + [2]*2)*3,
    'Fold': ([1] + [2])*6,
    'Accuracy': np.random.rand((12))
    })
df = df.groupby(['Classifier', 'Preprocess', 'Group']).agg({'Accuracy': 'mean'})
print df
                             Accuracy
Classifier Preprocess Group          
CNN        None       1      0.912794
                      2      0.514453
           PCA        1      0.694035
                      2      0.740769
SVM        None       1      0.235105
                      2      0.817883
           PCA        1      0.567313
                      2      0.657962

I'm trying to get it to look something like:

                      Accuracy - Group
Classifier Preprocess 1         2        
CNN        None       0.912794  0.514453              
           PCA        0.694035  0.740769
SVM        None       0.235105  0.817883
           PCA        0.567313  0.657962

I should note that the 'Accuracy' column actually ends up being a string because of extra calculations. Any ideas? I feel like I've tried every combination of weird pivot/stack/unstack/re-index, but the closest I can get is a very short table where everything is aggregated down oddly when I want to actually maintain the final column numbers (no aggregation).

peur
  • 13
  • 3
  • `df.unstack(-1)` – cs95 Jan 11 '18 at 06:51
  • 1
    @COLDSPEED Whoops, thought I tried that one. Thank you! – peur Jan 11 '18 at 06:58
  • @peur - so need `df.unstack(-1)` only? No necessary `rename` ? – jezrael Jan 11 '18 at 06:59
  • @jezrael I was mostly confused about the `unstack`, but the `rename` is very helpful as well – peur Jan 11 '18 at 07:09
  • @peur - I ask it because from your question I understand you need also `rename`. So then it is not dupe. But if need only `unstack`, please unaccept and delete your question, because dupe. – jezrael Jan 11 '18 at 07:11
  • @jezrael I'm relatively new to stack overflow, so I'm unsure on duplicate etiquette, but these meta post - [relevant meta-post](https://meta.stackoverflow.com/questions/265736/should-i-delete-my-question-if-it-is-marked-as-a-duplicate), [another](https://meta.stackexchange.com/questions/10841/how-should-duplicate-questions-be-handled) - seems to suggest I should leave it. Or should I unaccept your answer and click 'that solved my question' on the duplicate marker? – peur Jan 11 '18 at 07:23
  • @peur - Hmmm, you cannot mark as accepted to another solution, which is not under your question. You can only upvote, if get reputation 15+. But if need only `unstack`, what is one function answer, then is clear dupe and no reason for leave it. – jezrael Jan 11 '18 at 07:31
  • @peur - but if it is combination of multiple functions is possible it is not dupe, e.g. my solution is not dupe if need `unstack` and also `rename` first level of `MultiIndex` in columns. – jezrael Jan 11 '18 at 07:34

1 Answers1

1

I think you need unstack + rename:

print (df.unstack().rename(columns=lambda x: x + ' - '+ df.index.names[-1], level=0))
                      Accuracy - Group          
Group                                1         2
Classifier Preprocess                           
CNN        None               0.879696  0.644665
           PCA                0.613153  0.502952
SVM        None               0.589627  0.323793
           PCA                0.619407  0.356250

Alternative:

df1 = df.unstack().rename(columns=lambda x: '{} - {}'.format(x, df.index.names[-1]), level=0)
print (df1)
                      Accuracy - Group          
Group                                1         2
Classifier Preprocess                           
CNN        None               0.376257  0.531367
           PCA                0.058679  0.064388
SVM        None               0.630307  0.908984
           PCA                0.312942  0.895200

Detail:

print (df.index.names)
['Classifier', 'Preprocess', 'Group']

print (df.index.names[-1])
Group

print (df.unstack())
                       Accuracy          
Group                         1         2
Classifier Preprocess                    
CNN        None        0.376257  0.531367
           PCA         0.058679  0.064388
SVM        None        0.630307  0.908984
           PCA         0.312942  0.895200
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252