How do I convert a pandas pivot table to a dataframe

Question

I want to use a pivot table to summarise a dataset and then be able to access the information in the pivot table as if it were a DataFrame.

Consider a hierarchical dataset with patients being treated in a hospital and hospitals situated in regions:

import pandas as pd

example_data = {'patient' : ['p1','p2','p3','p4','p5','p6','p7','p8','p9','p10','p11','p12','p13','p14','p15','p16','p17','p18','p19','p20','p21','p22','p23','p24','p25','p26','p27','p28','p29','p30','p31','p32','p33','p34','p35','p36','p37','p38','p39','p40','p41','p42','p43','p44','p45','p46','p47','p48','p49','p50','p51','p52','p53','p54','p55','p56','p57','p58','p59','p60','p61','p62','p63'], 
                'hospital' : ['h1','h1','h1','h2','h2','h2','h2','h3','h3','h3','h3','h3','h4','h4','h4','h4','h4','h4','h5','h5','h5','h5','h5','h5','h5','h6','h6','h6','h6','h6','h6','h6','h6','h7','h7','h7','h7','h7','h7','h7','h7','h7','h8','h8','h8','h8','h8','h8','h8','h8','h8','h8','h9','h9','h9','h9','h9','h9','h9','h9','h9','h9','h9'], 
                'region' : ['r1','r1','r1','r1','r1','r1','r1','r1','r1','r1','r1','r1','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3'] }

example_dataframe = pd.DataFrame(example_data)

print example_dataframe

This produces a simple output as follows:

   hospital patient region
0        h1      p1     r1
1        h1      p2     r1
2        h1      p3     r1
3        h2      p4     r1
4        h2      p5     r1
5        h2      p6     r1
6        h2      p7     r1
7        h3      p8     r1
8        h3      p9     r1
9        h3     p10     r1
10       h3     p11     r1
11       h3     p12     r1
12       h4     p13     r2
13       h4     p14     r2
14       h4     p15     r2
15       h4     p16     r2
16       h4     p17     r2
etc.

Now I want to summarise using a pivot table, simply counting the number of patients in each hospital:

example_pivot_table = pd.pivot_table(example_dataframe, values='patient', rows=['hospital','region'], aggfunc='count')

print example_pivot_table

This produces the following output:

hospital  region
h1        r1         3
h2        r1         4
h3        r1         5
h4        r2         6
h5        r2         7
h6        r2         8
h7        r3         9
h8        r3        10
h9        r3        11
Name: patient, dtype: int64

As far as I understand it, this is actually a multi-indexed series.

How can I use this data to find out what region hospital h7 is in? If hospital, region and the patient count data were separate columns in a DataFrame, it would be easy. But I think hospital and region are indices. I've tried lots of things but haven't been able to get it to work.

try [`.reset_index()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html#pandas.DataFrame.reset_index) — behzad.nouri, Mar 31 '14 at 23:54
I had tried .reset_index(inplace=True) but that just produced an error message. However, using .reset_index() worked a treat! Thank you very much for the rapid response. — user1718097, Apr 01 '14 at 00:27

waitingkuo · Answer 1 · 2014-04-02T02:50:22.980

4

You can use get_level_values to get the hospital column. You can pass either the number of level or the name of level, i.e. 0 or hospital

Then you can get what you want by:

In [38]: example_pivot_table[ example_pivot_table.index.get_level_values('hospital') == 'h7' ]
Out[38]: 
hospital  region
h7        r3        9
Name: patient, dtype: int64

Update

To get the regions, you can do

example_pivot_table[ example_pivot_table.index.get_level_values('hospital') == 'h7' ]['regions']

edited Apr 02 '14 at 02:50

answered Apr 01 '14 at 03:11

waitingkuo

89,478
28
112
118

Thanks waitingkuo. This isn't quite what I wanted as I wanted to be able to get just the 'r3' bit. However, the method outlined will be very useful in future and it also helps me to understand how data can be manipulated in pandas. Thanks again. – user1718097 Apr 02 '14 at 00:49

LondonRob · Answer 2 · 2014-09-11T11:20:20.590

First up, this isn't a pivot table job, it's a groupby job.

Pivot tables are for re-shaping your data when you haven't set an index (see this doc article), stack and unstack are for reshaping when you have set an index, and groupby is for aggregating (which is what this is) and split-apply-combine operations.

Here's how to get your patient counts using groupby:

>>> patient_count = df.groupby(['hospital', 'region']).count()
>>> print patient_count
                 patient
hospital region         
h1       r1            3
h2       r1            4
h3       r1            5
h4       r2            6
h5       r2            7
h6       r2            8
h7       r3            9
h8       r3           10
h9       r3           11

To select certain rows in a multi-index, I usually use ix as follows:

>>> h7 = patient_count.ix['h7']
>>> print h7
        patient
region         
r3            9

And now you can use get_level_values

>>> h7.index.values[0]
'r3'

Or, if you don't want the multi-indexed version (and, for your purposes you probably don't) you can do:

>>> patient_count = patient_count.reset_index()

Which allows you to find which region hospital h7 is in as follows:

>>> patient_count.region[patient_count.hospital == 'h7']
6    r3
Name: region, dtype: object

If you just want r3 you can do:

>>> patient_count.region[patient_count.hospital == 'h7'].values[0]
'r3'

Note that reset_index doesn't happen in-place which makes it great for chaining methods like this:

>>> patient_count.ix['h7'].reset_index().region[0]
'r3'

Thanks very much for the reply. That's very helpful. But I don't quite understand the difference between a pivot table and group by. When should each be used? — user1718097, Sep 10 '14 at 22:51
@user1718097 I've updated the answer to reflect my attempt to answer your comment question. — LondonRob, Sep 11 '14 at 11:09

score 1 · Answer 3 · answered May 11 '17 at 16:33

This will do the trick :

levels = example_pivot_table.columns.levels
labels = example_pivot_table.columns.labels
example_pivot_table.columns = levels[1][labels[1]]
example_pivot_table.reset_index(inplace=True)
example_pivot_table

So,find the levels and labels in your pivot table assign column names and reset index in it. The final result should be the result dataframe of the pivot.

How do I convert a pandas pivot table to a dataframe

3 Answers3

Update