2

I am trying to build a table that has groups that are divided by subgroups with count and average for each subgroup. For example, I want to convert the following data frame:

To a table that looks like this where the interval is a bigger group and columns a thru i become subgroups within the group with the corresponding subgroups' count and average in each cell:

I have tried this with no success:

user9532692
  • 584
  • 7
  • 28

3 Answers3

2

Try.

df.groupby(['interval']).apply(lambda x : x.stack()
                                           .groupby(level=-1)
                                           .agg({'count', 'mean'}))

Use groupby with apply to apply a function for each group then stack and groupby again with agg to find count and mean.

ResidentSleeper
  • 2,385
  • 2
  • 10
  • 20
  • I get an error message saying **TypeError: count() takes 1 positional argument but 2 were given** – user9532692 Apr 13 '19 at 17:17
  • @user9532692 It worked fine for me. I'm not sure what happened. Can you try to use only `mean` or replace `count` with `size`? – ResidentSleeper Apr 13 '19 at 17:51
  • I am still getting an error message saying **DataError: No numeric types to aggregate**. I wondered if it's because the data values are not in integer, but running *type(hh.iloc[1,1])* returns *numpy.int64* – user9532692 Apr 13 '19 at 22:01
1

Use DataFrame.melt with GroupBy.agg and tuples for aggregate functions with new columns names:

df1 = (df.melt('interval', var_name='source')
         .groupby(['interval','source'])['value']
         .agg([('cnt','count'), ('average','mean')])
         .reset_index())
print (df1.head())
  interval source  cnt  average
0        0      a    1      5.0
1        0      b    1      0.0
2        0      c    1      0.0
3        0      d    1      0.0
4        0      f    1      0.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • The result looks like it's counting items that fall into each interval only. How can you make it to count items that fall into both _interval_ and _source_? – user9532692 Apr 14 '19 at 22:11
  • @user9532692 - I am confused with output, can you explain row `6~9 a 2 7` from output? why `2`? why `7` ? – jezrael Apr 15 '19 at 07:08
  • 2 is the count every item that falls in the interval 6~9 **AND** source a. 7 is the average of those two items that falls in the same group (interval 6~9 AND where source is a) – user9532692 Apr 20 '19 at 00:01
0

The following code solves the problem I asked for:

df.group(['interval'],,as_index=False).agg({
                                                'a':{"count":"mean"},
                                                'b':{"count":"mean"},
                                                'c':{"count":"mean"},
                                                'd':{"count":"mean"},
                                                'f':{"count":"mean"},
                                                'g':{"count":"mean"},
                                                'i':{"count":"mean"}
                                            })
user9532692
  • 584
  • 7
  • 28