49

I would like to sum (marginalize) over one level in a series with a 3-level multiindex to produce a series with a 2 level multiindex. For example, if I have the following:

ind = [tuple(x) for x in ['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']]
mi = pd.MultiIndex.from_tuples(ind)
data = pd.Series([264, 13, 29, 8, 152, 7, 15, 1], index=mi)

A  B  C    264
      c     13
   b  C     29
      c      8
a  B  C    152
      c      7
   b  C     15
      c      1

I would like to sum over the C variables to produce the following output:

A  B    277
   b     37
a  B    159
   b     16

What is the best way in Pandas to do this?

cottontail
  • 10,268
  • 18
  • 50
  • 51
dylkot
  • 2,275
  • 2
  • 20
  • 24

2 Answers2

60

If you know you always want to aggregate over the first two levels, then this is pretty easy:

In [27]: data.groupby(level=[0, 1]).sum()
Out[27]:
A  B    277
   b     37
a  B    159
   b     16
dtype: int64
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
  • 2
    How do I merge the names of the levels into column/index names? – denfromufa Jun 28 '17 at 17:30
  • 2
    In the meantime since @Jeff's comment, `df.sum(level=...)` has become deprecated and throws a warning that `df.groupby(levels=...).sum()` is the proper way to do it. – Antimon Apr 03 '22 at 18:11
  • @Jeff Put your comment in a proper answer so people can upvote it and it becomes king. – Philipp Sep 08 '22 at 08:50
0

Another possibility is to unstack the Series into a dataframe and sum horizontally.

data.unstack().sum(axis=1)


A  B    277
   b     37
a  B    159
   b     16
dtype: int64

The level to unstack on must be the level(s) who values are to be summed up. So for example, the following two are equivalent.

x = data.unstack(level=0).sum(axis=1)
y = data.groupby(level=[1,2]).sum()

x.equals(y) # True
cottontail
  • 10,268
  • 18
  • 50
  • 51