0

GOAL

  • I have a pandas dataframe with float and object types.

  • I want to group the dataframe by the 'name' column groupped = df.groupby(["name"])

  • Than aggregate all the other columns.

  • There are columns with float values that I sum together

  • But I also have 'Object types' and the goal would be to just keep 1 object type ex.: the 1st one. They are the same. So I am trying to use min but it doesn't works but I can not find any other function for it that works with object types.

aggregated = groupped.agg({ 
         'name' : ['min'],
         'id' : ['min'],
         'date' : ['min'],
         'number_one' : ['sum'],
         'type' : ['min'],
         'number_two' : ['sum'],
})

ERROR

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-102-3594b7bd0c31> in <module>
      9          'number_one' : ['sum'],
     10          'type' : ['min'],
---> 11          'number_two' : ['sum'],
     12 })
     13 
...
TypeError: '<=' not supported between instances of 'str' and 'float'

Already Tried

sogu
  • 2,738
  • 5
  • 31
  • 90

1 Answers1

2

First idea is use GroupBy.first for object columns:

aggregated = groupped.agg({ 
         'name' : ['first'],
         'id' : ['first'],
         'date' : ['first'],
         'number_one' : ['sum'],
         'type' : ['first'],
         'number_two' : ['sum'],
})

If want avoid MultiIndex remove []:

aggregated = groupped.agg({ 
         'name' : 'first',
         'id' : 'first',
         'date' : 'first',
         'number_one' : 'sum',
         'type' : 'first',
         'number_two' : 'sum',
})

More general solution is for numeric columns aggregate sum and for another columns get first value in lambda function:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.iat[0]
aggregated = groupped.agg(f)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252