2

I can't figure out how pandas pd.pivot_table stores or references the names of the variables in the table's row/index and columns; from looking at the source code it doesn't seem to store them anywhere in any of its attributes, although str(tbl) obviously gets them from somewhere. Spent hours trying to figure it out and can't:

FYI The class hierarchy is: pivot_table (tools/pivot.py) is an instance of class DataFrame (core/frame.py) which inherits from -> NDFrame (core/generic.py) -> PandasObject (core/base.py) -> StringMixin. But after going through all that source, I don't see variable names stored anywhere in that hierarchy?!

import pandas as pd
import numpy as np

df = pd.DataFrame({'foo': [1,2,2,3,2,3,1,3],
                   'bar': [8,6,8,7,7,6,6,7],
                   'baz': np.random.rand(8).round(2)})

tbl = df.pivot_table(values='baz', index='foo', columns='bar')

# where are the names 'foo', 'bar' stored inside the attributes of tbl?

# bar     6      7     8
# foo                   
# 1    0.39    NaN  0.97
# 2    0.76  0.240  0.97
# 3    0.18  0.245   NaN
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
smci
  • 32,567
  • 20
  • 113
  • 146

1 Answers1

2

'foo' and 'bar' are stored as the names of the index and column index of tbl respectively. Index objects are distinct from DataFrame/NDFrame objects.

>>> tbl.index.name
'foo'
>>> tbl.columns.name
'bar'

The relevant part of the source code, where the .name attribute is set, is here.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238