Pandas Pivot table, Dealing with Multiple Values in column

Question

I am working with data that contains multiple UNSPSC codes each with a different Price. I want to get a pivot table that combines all of the prices per UNSPSC code.

Ultimately, I want to extract the top 5 UNSPSCs by their highest price.

UNSPSC= pivot_table(analysis, rows=['UNSPSC'], cols =['Extended Price'])

This definitely doesn't work. I don't want all of the columns it provides. I just want two. UNSPSC and then the aggregate price associate with the unique code.

<class 'pandas.core.frame.DataFrame'>
Index: 78 entries, 12142100.0 to 55121611.0
Columns: 21924 entries, ($ Difference, 2.70000004768) to (Quantity, 3255.19995117)
dtypes: float64(21924)

I also tried to pivot by

UNSPSC =analysis.pivot(index = 'UNSPSC', columns ='Extended Price')

My error was ReshapeError: Index contains duplicate entries, cannot reshape

I tried to utilize groupby and aggfun=np.sum but I can't seem to get what I am looking for. How can I get a pivot table with that shows the sum of the prices to each unique UNSPSC?

Thanks!

score 0 · Answer 1 · answered Oct 17 '13 at 19:02

Not sure it's what you want, but try to combine groupby and aggregate:

>>> analysis = pd.DataFrame({"UNSPSC":["aa", "aa", "bb", "bb", "bb"], "ExtendedPrice": [1.1, 5.1, 3.5, 4.3, 3.0]})
>>> analysis.groupby('UNSPSC').aggregate(np.sum)
        ExtendedPrice
UNSPSC               
aa                6.2
bb               10.8

score 0 · Answer 2 · answered Oct 17 '13 at 21:44

from pandas import pivot_table
import numpy as np


UNSPSC = pivot_table( analysis, values = 'Extended Price', rows = 'UNSPSC', aggfunc = np.sum)


Pivot = UNSPSC.order(ascending = False)
 UNSPSC
12352200    350045.593750
12352100    165760.484375
12352300     96212.031250
12190000     68791.710938
12352000     62938.332031

This ended up working for me with the above results

Pandas Pivot table, Dealing with Multiple Values in column

2 Answers2