3

To avoid too many tiny slices in a pie chart, I need to merge/sum all elements in a series below a certain threshold. So far this is what I came up with:

from pandas import Series
import numpy as np

ser = Series(np.random.randint(100, size=10), index=list('abcdefghij')).order(ascending=False)

thresh = 20
cleaned = ser[ser>=thresh].append(Series([ser[ser<thresh].sum()],
                                         index=["below {}".format(thresh)]))

this delivers the correct result, but the use of append bothers me and does not strike me as particularly pandas-like.

Is there a more appealing way to achieve the same result?

Update:

This is a solution based on the comment by IanS below.

ser.index = map(lambda (x, y): x if y>=thresh else "below {}".format(thresh),
                ser.iteritems())

or

ser.index = [x if y >=thresh else "below {}".format(thresh) for (x,y) in ser.iteritems()]

and then

ser.groupby(ser.index).sum()
Community
  • 1
  • 1
Patrick Allo
  • 473
  • 4
  • 15

1 Answers1

0

You can try this:

df = ser.groupby(ser>20).apply(lambda x:
                               x if (x>20).all()
                               else pd.Series(x.sum(),
                                              index=["below 20"])
                              ).reset_index().set_index("level_1"
                                                        ).iloc[:,1:][0].copy()

df.name = None
df.index.name=None
df.sort(ascending=False)
df
c           97
f           88
e           61
h           60
a           53
g           49
i           37
d           24
below 20    21
dtype: int64

But I'm not sure it's better than your solution.

jrjc
  • 21,103
  • 9
  • 64
  • 78