3

There seem to be a lot of possibilities to pivot flat table data into a 3d array but I'm somehow not finding one that works: Suppose I have some data with columns=['name', 'type', 'date', 'value']. When I try to pivot via

pivot(index='name', columns=['type', 'date'], values='value')

I get

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Am I reading docs from dev pandas maybe? It seems like this is the usage described there. I am running 0.8 pandas.

I guess, I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel? I can use groupby and get the job done, but then this is almost like what I would do in numpy to assemble an n-d array. Seems like a fairly generic operation so I would imagine it might be implemented already.

mathtick
  • 6,487
  • 13
  • 56
  • 101

2 Answers2

8

pivot only supports using a single column to generate your columns. You probably want to use pivot_table to generate a pivot table using multiple columns e.g.

pandas.tools.pivot.pivot_table(your_dataframe, values='value', index='name', columns=['type', 'date'], aggfunc='sum')

The hierarchical columns that are mentioned in the API reference and documentation for pivot relates to cases where you have multiple value fields rather than multiple categories.

Assuming 'type' and 'date' are categories, whose values should be used as the column names, then you should use pivot_table.

However, if you want separate columns for different value fields for the same category (e.g. 'type'), then you should use pivot without specifying the value column and your category as the columns parameter.

For example, suppose you have this DataFrame:

df = DataFrame({'name': ['A', 'B', 'A', 'B'], 'type': [1, 1, 2, 2], 'date': ['2012-01-01', '2012-01-01', '2012-02-01', '2012-02-01'],  'value': [1, 2, 3, 4]})

pt = df.pivot_table(values='value', index='name', columns=['type', 'date'])
p = df.pivot('name', 'type')

pt will be:

type           1           2
date  2012-01-01  2012-02-01
name                        
A              1           3
B              2           4

and p will be:

          date              value   
type           1           2      1  2
name                                  
A     2012-01-01  2012-02-01      1  3
B     2012-01-01  2012-02-01      2  4

NOTE: For pandas version < 0.14.0, the index and columns keyword arguments should be replaced with rows and cols respectively.

Community
  • 1
  • 1
Matti John
  • 19,329
  • 7
  • 41
  • 39
  • This seems to be the case although the docs seem to imply pivoting to a panel is possible: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html Maybe this is an intention that has not yet been implemented? – mathtick Nov 07 '12 at 14:17
  • I've expanded the answer to explain the differences, hope it helps. – Matti John Nov 07 '12 at 14:48
  • Thanks, I guess the pandas philosophy is that you should probably work with your 3d data in a 2d multiindex array ... probably not a bad idea once you get used to indexing and slicing on multiindexes. – mathtick Nov 07 '12 at 16:22
  • great answer, but now `rows` & `cols` are deprecated, please update the original answer to use `index` & `columns` instead. thanks – Haleemur Ali Jun 11 '14 at 22:36
  • ... it is probably more intuitive/readable to stick to using stack/unstacks and groupby as in the other solution below. I basically never use pivot_table – mathtick Nov 24 '14 at 22:32
4

The original post ended with the question:

"I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel?"

to which I was looking for a solution myself.

I ended up with the following:

In [1]: import pandas as pd

## generate xyz example:
In [3]: df = pd.DataFrame({col:pd.np.random.randint(0,10,10) 
                               for col in ['x','y','z','data']})

## set all x,y,z coordinates as indices
In [5]: df.set_index(['x','y','z'], inplace=True)

## set the z coordinate as headers of the columns 
# NB: this is will turn the data into "dense" with NaNs where there were no 'data'
In [7]: df = df['data'].unstack()

## now it is ready to be "pivot"ed into a panel
In [9]: data_panel = df.to_panel()

In [10]: df
Out[10]: 
     data                        
z       1   3   4   5   6   7   9
x y                              
1 5   NaN NaN NaN NaN NaN NaN   1
  6   NaN NaN NaN NaN NaN NaN   0
2 9   NaN NaN NaN NaN NaN   1 NaN
3 9     6 NaN NaN NaN NaN NaN NaN
5 9   NaN NaN NaN NaN NaN NaN   8
7 1   NaN NaN NaN NaN   8 NaN NaN
  3   NaN NaN NaN NaN NaN NaN   5
  7   NaN NaN NaN   1 NaN NaN NaN
  9   NaN   0 NaN NaN NaN NaN NaN
9 5   NaN NaN   1 NaN NaN NaN NaN

[10 rows x 7 columns]

In [11]: data_panel
Out[11]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 7 (items) x 6 (major_axis) x 6 (minor_axis)
Items axis: 1 to 9
Major_axis axis: 1 to 9
Minor_axis axis: 1 to 9

The columns headers will be the Items of the Panel, the first level index with be the MajorAxis (rows) and the second level will be the MinorAxis (columns)

eldad-a
  • 3,051
  • 3
  • 22
  • 25