2

Suppose I have a DataFrame which has a subindex structure like the following, with 'date', 'tenor' 'mat' and 'strike' and where the fields to be observed are stored in the column 'vol':

date     tenor   mat strike    vol                                      
20120903 3m      1y  0.25      52.
                     0.50      51.
                     1.00      49.
20120903 3m      5y  0.25      32.
                     0.50      55.
                     1.00      23.
20120903 3m      10y 0.25      65.
                     0.50      55.
                     1.00      19.
20120904 3m      1y  0.25      32.
                     0.50      57.
                     1.00      44.
20120904 3m      5y  0.25      54.
                     0.50      50.
                     1.00      69.
20120904 3m      10y 0.25      42.
                     0.50      81.
                     1.00      99.

Say I want to reorganize this data by getting a new dataframe with subindexes 'date' + 'tenor' and with 'values' given by a 3d array composed by 'mat', 'strike' and 'vol' from the original dataframe in a manner like this:

date     tenor   values                                                       
20120903 3m      [[1y,5y,10y],[0.25, 0.50, 1.00], [52., 51., 49.],
                                                  [32., 55., 23.],
                                                  [65., 55., 19.]]
20120904 3m      [[1y,5y,10y],[0.25, 0.50, 1.00], [32., 57., 44.],
                                                  [54., 50., 69.],
                                                  [42., 81., 99.]]

I tried with various attempts of 'unstack', 'groupby' and 'pivot' but with no success. I could only reach my objective byusing a lot of python vector manipulation, but this was a slow and inefficient procedure. Is there any specific, more efficient pandas procedure in order to get the same result? I'm getting lost at this... Thanks for your help, Maurizio

mspadaccino
  • 382
  • 2
  • 5
  • 17

1 Answers1

3

How about something like this:

In [111]: df
Out[111]: 
                mat  strike  vol
date     tenor                  
20120903 3m      1y    0.25   52
         3m      1y    0.50   51
         3m      1y    1.00   49
         3m      5y    0.25   32
         3m      5y    0.50   55
         3m      5y    1.00   23
         3m     10y    0.25   65
         3m     10y    0.50   55
         3m     10y    1.00   19
20120904 3m      1y    0.25   32
         3m      1y    0.50   57
         3m      1y    1.00   44
         3m      5y    0.25   54
         3m      5y    0.50   50
         3m      5y    1.00   69
         3m     10y    0.25   42
         3m     10y    0.50   81
         3m     10y    1.00   99

In [112]: def agg_func(x):
    mats = list(x.mat.unique())
    strikes = list(x.strike.unique())
    vols = x.pivot('mat', 'strike', 'vol').reindex(mats, columns=strikes)
    return [mats, strikes, vols.values.tolist()]
   .....: 

In [113]: rs = df.groupby(level=['date', 'tenor']).apply(agg_func)

In [114]: rs
Out[114]: 
date      tenor
20120903  3m       [['1y', '5y', '10y'], [0.25, 0.5, 1.0], [[52.0...
20120904  3m       [['1y', '5y', '10y'], [0.25, 0.5, 1.0], [[32.0...

In [115]: rs.values[0]
Out[115]: 
[['1y', '5y', '10y'],
 [0.25, 0.5, 1.0],
 [[52.0, 51.0, 49.0], [32.0, 55.0, 23.0], [65.0, 55.0, 19.0]]]
Chang She
  • 16,692
  • 8
  • 40
  • 25
  • Hi Chang, thanks for your reply: one last question, how do you get the 'df' in the first line? I mean, yours looks different from my original one, in the fact that you appear to have just three columns originally ('mat' 'strike' and 'vol') whilst mine has just 'vol' values, the others being part of the index – mspadaccino Sep 23 '12 at 19:46
  • 1
    I just pasted your snippet in. You can use `set_index` or `reset_index` to make some columns the index or vice versa. My df has a MultiIndex with "date" and "tenor" as levels. – Chang She Sep 23 '12 at 22:19