14

I have some entries in dataframe like :

name, age, phonenumber
 A,10, Phone1
 A,10,Phone2
 B,21,PhoneB1
 B,21,PhoneB2
 C,23,PhoneC

Here is what I am trying to achieve as result of pivot table:

 name, age, phonenumbers, phonenocount
 A,10, "Phone1,Phone2" , 2
 B,21,  "PhoneB1,PhoneB2", 2
 C,23, "PhoneC" , 1

I was trying something like :

pd.pivot_table(phonedf, index=['name','age','phonenumbers'], values=['phonenumbers'], aggfunc=np.size)

however I want the phone numbers to be concatenated as part of aggfunc. Any Suggestions ?

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    This seems like a groupby question rather than pivot-table. Do you put anything into columns in the actual dataframe? – ayhan Aug 16 '16 at 18:27
  • couldn't understand your question "Do you put anything into columns in the actual dataframe?". Please help to understand more – Sivaswami Jeganathan Aug 16 '16 at 18:30
  • A pivot table has indices, columns and values. So you would take some rows and turn them into columns for example. If all of them stay as rows then this can be considered a group-by operation. But since you used pivot_table I thought maybe you had some extra columns that you didn't include in the question. – ayhan Aug 16 '16 at 18:35

3 Answers3

13

You can use agg function after the groupby:

df.groupby(['name', 'age'])['phonenumber'].\
    agg({'phonecount': pd.Series.nunique, 
         'phonenumber': lambda x: ','.join(x)
        }
       )

#               phonenumber  phonecount
# name  age     
#    A   10   Phone1,Phone2           2
#    B   21 PhoneB1,PhoneB2           2
#    C   23          PhoneC           1

Or a shorter version according to @root and @Jon Clements:

df.groupby(['name', 'age'])['phonenumber'].\
   agg({'phonecount': 'nunique', 'phonenumber': ','.join})
yoonghm
  • 4,198
  • 1
  • 32
  • 48
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 2
    If you want to make things slightly more concise, you could just write `'nunique'` as a string instead of `pd.Series.nunique`. – root Aug 16 '16 at 18:40
  • 2
    @Psidom you can also use `','.join` directly instead of a lambda... so this becomes `df.groupby(['name', 'age'])['phonenumber'].agg({'phonecount': 'nunique', 'phonenumber': ','.join})` – Jon Clements Aug 16 '16 at 18:49
  • Thanks that works. I was also trying pivot table solution as mentioned earlier. Here is something I came : pd.pivot_table(df,index=['name','age'],aggfunc=[np.size,np.unique]) – Sivaswami Jeganathan Aug 16 '16 at 21:00
  • Panda version 0.23.4 gave this warning: `FutureWarning: using a dict on a Series for aggregation is deprecated and will be removed in a future version`. Another solution to avoid the issue is `df.groupby(['name', 'age']).agg({'phonenumber': [('Count', 'nunique'), ('Phonenumber', ','.join)]})` – yoonghm Dec 02 '18 at 14:40
  • Is it possible to add a new phone number to `phonenumber` only if it is not already in the list? For example, `Phone2` is `Phone1`. – yoonghm Dec 02 '18 at 15:59
  • I got it: `lambda x: ','.join(set(x.tolist()))` – yoonghm Dec 02 '18 at 16:24
3

This answer comes from here: https://medium.com/@enricobergamini/creating-non-numeric-pivot-tables-with-python-pandas-7aa9dfd788a7

Kudos to Enrico Bergamini for writing about this. I was struggling with this too.

Define the input first.

df = pd.DataFrame({'name':['a','a','b','b','c'], 
                   'age':[10, 10, 21, 21, 23], 
                   'phonenumber':['phone1', 'phone2', 'phoneb1', 'phoneb2',
                                  'phonec']})

Use pandas pivot_table to re-shape as you want.

temp = pd.pivot_table(df, index=['name', 'age'], values='phonenumber',
                      aggfunc=[len, lambda x: ",".join(str(v) for v in x)])

Output:

                 len         <lambda>
         phonenumber      phonenumber
name age                             
a    10            2    phone1,phone2
b    21            2  phoneb1,phoneb2
c    23            1           phonec

If you want to drop the multiindex in the columns, use this: temp.columns = temp.columns.droplevel()

After you drop the functions from the column index, you can rename them easily.

temp.columns = ['count', 'concat']

New stored variable is:

          count           concat
name age                        
a    10       2    phone1,phone2
b    21       2  phoneb1,phoneb2
c    23       1           phonec
Foggy
  • 383
  • 2
  • 12
0

The pivot table uses df for data and phone for index and concatenates rows of code in a string variable. I used a list comprehension after aggregating to rename the resulting columns

fp=pd.pivot_table(data=df,index=["Phone"],values=["Code"],aggfunc=[len,  lambda x: ", ".join(str(v) for v in x)])
fp.columns =["# of Codes" if str(column)=="('len', 'NewCode')" else str(column) for column in fp.columns.tolist()]
fp.columns =["Spec Code" if str(column)=="('<lambda>', 'NewCode')" else str(column) for column in fp.columns.tolist()]
Golden Lion
  • 3,840
  • 2
  • 26
  • 35