2

I have a dataframe and i want to group its "First" and "Second" column and then to produce the expected output as mentioned below:

df = pd.DataFrame({'First':list('abcababcbc'), 'Second':list('qeeeeqqqeq'),'Value_1':np.random.randint(4,50,10),'Value_2':np.random.randint(40,90,10)})
print(df)

Output>

   First    Second  Value_1     Value_2
0   a         q        17         70
1   b         e        44         47
2   c         e         5         56
3   a         e        23         58
4   b         e        10         76
5   a         q        11         67
6   b         q        21         84
7   c         q        42         67
8   b         e        36         53
9   c         q        16         63

When i Grouped this DataFrame using groupby, I am getting below output:

def func(arr,columns):
    return arr.sort_values(by = columns).drop(columns, axis = 1)

df.groupby(['First','Second']).apply(func, columns = ['First','Second'])

                Value_1      Value_2
First Second            
a       e     3     23         58
        q     0     17         70
              5     11         67
b       e     1     44         47
              4     10         76
              8     36         53
        q     6     21         84
c       e     2     5          56
        q     7     42         67
              9     16         63

However i want below output:

Expected output:

                Value_1      Value_2
First Second            
a       e     3     23         58
            All     23         58
        q     0     17         70
              5     11         67
            All     28        137   
b       e     1     44         47
              4     10         76
              8     36         53
            All     90         176
        q     6     21         84
            All     21         84
c       e     2     5          56
            All     5          56
        q     7     42         67
              9     16         63
            All     58         130

It's not necessary to print "All" string but to print the sum of all grouped rows.

Alpha Green
  • 98
  • 10

3 Answers3

2
df = pd.DataFrame({'First':list('abcababcbc'), 'Second':list('qeeeeqqqeq'),'Value_1':np.random.randint(4,50,10),'Value_2':np.random.randint(40,90,10)})

  First Second  Value_1  Value_2
0     a      q        4       69
1     b      e       20       74
2     c      e       13       82
3     a      e        9       41
4     b      e       11       79
5     a      q       32       77
6     b      q        6       75
7     c      q       39       62
8     b      e       26       80
9     c      q       26       42
def lambda_t(x):
    df = x.sort_values(['First','Second']).drop(['First','Second'],axis=1)
    df.loc['all'] = df.sum()
    return df

df.groupby(['First','Second']).apply(lambda_t)


                  Value_1  Value_2
First Second                      
a     e      3          9       41
             all        9       41
      q      0          4       69
             5         32       77
             all       36      146
b     e      1         20       74
             4         11       79
             8         26       80
             all       57      233
      q      6          6       75
             all        6       75
c     e      2         13       82
             all       13       82
      q      7         39       62
             9         26       42
             all       65      104
fundchan
  • 36
  • 1
1

You can try this:

  1. reset the index in your group by:

    d1 = df.groupby(['First','Second']).apply(func, columns = ['First','Second']).reset_index()

  2. Then group by 'First' and 'Second' and sum the values columns.

    d2 = d.groupby(['First', 'Second']).sum().reset_index()

  3. Create the 'level_2' column in the new dataframe and concatenate with the initial one to get the desired result

    d2.loc[:,'level_2'] = 'All' pd.concat([d1,d2],0).sort_values(by = ['First', 'Second'])

Jorge
  • 2,181
  • 1
  • 19
  • 30
1

Not sure about your function; however, you could chunk it into two steps:

Create an indexed dataframe, where you append the First and Second columns to the existing index:

df.index = df.index.astype(str).rename("Total")
indexed = df.set_index(["First", "Second"], append=True).reorder_levels(
    ["First", "Second", "Total"]
)

indexed


                     Value_1    Value_2
First   Second  Total       
   a       q       0    17  70
   b       e       1    44  47
   c       e       2    5   56
   a       e       3    23  58
   b       e       4    10  76
   a       q       5    11  67
   b       q       6    21  84
   c       q       7    42  67
   b       e       8    36  53
   c       q       9    16  63

Create an aggregation, grouped by First and Second:

summary = (
    df.groupby(["First", "Second"])
    .sum()
    .assign(Total="All")
    .set_index("Total", append=True)
)

summary

                        Value_1     Value_2
  First     Second  Total       
      a        e    All     23      58
      q             All     28      137
      b        e    All     90      176
      q             All     21      84
      c        e    All     5       56
      q             All     58      130

Combine indexed and summary dataframes:

pd.concat([indexed, summary]).sort_index(level=["First", "Second"])

                     Value_1    Value_2
First   Second  Total        
    a   e         3     23  58
                All     23  58
    q             0     17  70
                  5     11  67
                All     28  137
    b   e         1     44  47
                  4     10  76
                  8     36  53
                All     90  176
    q             6     21  84
                All     21  84
    c   e         2     5   56
                All     5   56
    q             7     42  67
                  9     16  63
                All     58  130
sammywemmy
  • 27,093
  • 4
  • 17
  • 31