Python Pandas : Pivot table : aggfunc concatenate instead of np.size or np.sum

Question

I have some entries in dataframe like :

name, age, phonenumber
 A,10, Phone1
 A,10,Phone2
 B,21,PhoneB1
 B,21,PhoneB2
 C,23,PhoneC

Here is what I am trying to achieve as result of pivot table:

 name, age, phonenumbers, phonenocount
 A,10, "Phone1,Phone2" , 2
 B,21,  "PhoneB1,PhoneB2", 2
 C,23, "PhoneC" , 1

I was trying something like :

pd.pivot_table(phonedf, index=['name','age','phonenumbers'], values=['phonenumbers'], aggfunc=np.size)

however I want the phone numbers to be concatenated as part of aggfunc. Any Suggestions ?

This seems like a groupby question rather than pivot-table. Do you put anything into columns in the actual dataframe? — ayhan, Aug 16 '16 at 18:27
couldn't understand your question "Do you put anything into columns in the actual dataframe?". Please help to understand more — Sivaswami Jeganathan, Aug 16 '16 at 18:30
A pivot table has indices, columns and values. So you would take some rows and turn them into columns for example. If all of them stay as rows then this can be considered a group-by operation. But since you used pivot_table I thought maybe you had some extra columns that you didn't include in the question. — ayhan, Aug 16 '16 at 18:35

score 13 · Accepted Answer · edited Dec 02 '18 at 16:59

13

You can use agg function after the groupby:

df.groupby(['name', 'age'])['phonenumber'].\
    agg({'phonecount': pd.Series.nunique, 
         'phonenumber': lambda x: ','.join(x)
        }
       )

#               phonenumber  phonecount
# name  age     
#    A   10   Phone1,Phone2           2
#    B   21 PhoneB1,PhoneB2           2
#    C   23          PhoneC           1

Or a shorter version according to @root and @Jon Clements:

df.groupby(['name', 'age'])['phonenumber'].\
   agg({'phonecount': 'nunique', 'phonenumber': ','.join})

edited Dec 02 '18 at 16:59

yoonghm

4,198
1
32
48

answered Aug 16 '16 at 18:36

Psidom

209,562
33
339
356

2

If you want to make things slightly more concise, you could just write `'nunique'` as a string instead of `pd.Series.nunique`. – root Aug 16 '16 at 18:40
2

@Psidom you can also use `','.join` directly instead of a lambda... so this becomes `df.groupby(['name', 'age'])['phonenumber'].agg({'phonecount': 'nunique', 'phonenumber': ','.join})` – Jon Clements Aug 16 '16 at 18:49
Thanks that works. I was also trying pivot table solution as mentioned earlier. Here is something I came : pd.pivot_table(df,index=['name','age'],aggfunc=[np.size,np.unique]) – Sivaswami Jeganathan Aug 16 '16 at 21:00
Panda version 0.23.4 gave this warning: `FutureWarning: using a dict on a Series for aggregation is deprecated and will be removed in a future version`. Another solution to avoid the issue is `df.groupby(['name', 'age']).agg({'phonenumber': [('Count', 'nunique'), ('Phonenumber', ','.join)]})` – yoonghm Dec 02 '18 at 14:40
Is it possible to add a new phone number to `phonenumber` only if it is not already in the list? For example, `Phone2` is `Phone1`. – yoonghm Dec 02 '18 at 15:59
I got it: `lambda x: ','.join(set(x.tolist()))` – yoonghm Dec 02 '18 at 16:24

score 3 · Answer 2 · answered Jun 10 '22 at 22:53

This answer comes from here: https://medium.com/@enricobergamini/creating-non-numeric-pivot-tables-with-python-pandas-7aa9dfd788a7

Kudos to Enrico Bergamini for writing about this. I was struggling with this too.

Define the input first.

df = pd.DataFrame({'name':['a','a','b','b','c'], 
                   'age':[10, 10, 21, 21, 23], 
                   'phonenumber':['phone1', 'phone2', 'phoneb1', 'phoneb2',
                                  'phonec']})

Use pandas pivot_table to re-shape as you want.

temp = pd.pivot_table(df, index=['name', 'age'], values='phonenumber',
                      aggfunc=[len, lambda x: ",".join(str(v) for v in x)])

Output:

                 len         <lambda>
         phonenumber      phonenumber
name age                             
a    10            2    phone1,phone2
b    21            2  phoneb1,phoneb2
c    23            1           phonec

If you want to drop the multiindex in the columns, use this: temp.columns = temp.columns.droplevel()

After you drop the functions from the column index, you can rename them easily.

temp.columns = ['count', 'concat']

New stored variable is:

          count           concat
name age                        
a    10       2    phone1,phone2
b    21       2  phoneb1,phoneb2
c    23       1           phonec

score 0 · Answer 3 · answered Feb 02 '23 at 16:40

The pivot table uses df for data and phone for index and concatenates rows of code in a string variable. I used a list comprehension after aggregating to rename the resulting columns

fp=pd.pivot_table(data=df,index=["Phone"],values=["Code"],aggfunc=[len,  lambda x: ", ".join(str(v) for v in x)])
fp.columns =["# of Codes" if str(column)=="('len', 'NewCode')" else str(column) for column in fp.columns.tolist()]
fp.columns =["Spec Code" if str(column)=="('<lambda>', 'NewCode')" else str(column) for column in fp.columns.tolist()]

Python Pandas : Pivot table : aggfunc concatenate instead of np.size or np.sum

3 Answers3

Linked