2

I have the following table :

In [303]: table.head()
Out[303]: 
            people  weekday  weekofyear
2012-01-01     119        6          52
2012-01-02      76        0           1
2012-01-03      95        1           1
2012-01-04     102        2           1
2012-01-05      87        3           1

I would like to create a simple pd.DataFrame where :

  • columns = [1, 2, ..., 52] (weekofyear)
  • rows = [0, 1, ..., 6] (weekday)
  • values = np.sum

I tried using pd.pivot_table which gave me the expected result :

In [308]: p = pd.pivot_table(table, index=["weekday"], columns=["weekofyear"], values=["people"], aggfunc=[np.sum])
     ...: p
     ...: 
Out[308]: 
              sum                                             ...             \
           people                                             ...              
weekofyear     1    2    3    4    5    6    7    8   9    10 ...    43   44   
weekday                                                       ...              
0             162   86   84   95   92   98  108  102  97   87 ...   108   86   
1              95  113   88   78  108  112   98  104  87  105 ...    85   82   
2             102   70   93   82  103   80  103   85  82   96 ...    87  105   
3              87   91  101   83   91  100  100   80  89   86 ...    87   91   
4             111   91  110  103   93  116  110   99  78   77 ...    83  102   
5             117  107   99   88   97   90  100   91  97   88 ...   103  110   
6              92   95   90   86   91  103   98  100  89   96 ...    94  101   



weekofyear   45   46   47   48   49   50   51   52  
weekday                                             
0            99   92   99   83  107  106   93  107  
1           105   83  101   93  102   89  113   84  
2            96   84  110   83  104   84   84  116  
3            87   96   87   88   88   83  113   93  
4            93   81  104  108   72  101  109   97  
5            81  107   97   89   86  108  113  101  
6            93   92   93   91   89   96   93  226  

[7 rows x 52 columns]

but instead of having my weekofyears columns, I got stuck with a MultiIndex I could not get rid of. As shown below :

In [309]: p.columns
Out[309]: 
MultiIndex(levels=[['sum'], ['people'], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]],
           names=[None, None, 'weekofyear']

while the index seems fine :

In [311]: p.index
Out[311]: Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64', name='weekday'  

I tried playing with unstack() & reset_index() functions, without success.

Am I missing something ?

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
gowithefloww
  • 2,211
  • 2
  • 20
  • 31

1 Answers1

2

Instead of giving lists to values and aggfunc, you should try giving single values to them. Example -

p = pd.pivot_table(table, index=["weekday"], columns=["weekofyear"], values="people", aggfunc=np.sum)

Demo -

In [3]: table
Out[3]:
            people  weekday  weekofyear
2012-01-01     119        6          52
2012-01-02      76        0           1
2012-01-03      95        1           1
2012-01-04     102        2           1
2012-01-05      87        3           1

In [12]: p = pd.pivot_table(table, index=["weekday"], columns=["weekofyear"], values="people", aggfunc=np.sum)

In [13]: p
Out[13]:
weekofyear   1    52
weekday
0            76  NaN
1            95  NaN
2           102  NaN
3            87  NaN
6           NaN  119

In [14]: p.columns
Out[14]: Int64Index([1, 52], dtype='int64', name='weekofyear')

From documentation -

aggfunc : function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)

Similar is the case with values , though not specifically mentioned in the documentation

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176