2

I am making a pivot table using pandas. If I set aggfunc=sum or aggfunc=count on a column of boolean values, it works fine provided there's at least one True in the column. E.g. [True, False, True, True, False] would return 3. However, if all the values are False, then the pivot table outputs False instead of 0. No matter what, I can't get around it. The only way I can circumvent it is to define a function follows:

def f(x):
    mySum = sum(x)
    return "0" if mySum == 0 else mySum

and then set aggfunc=lambda x: f(x). While that works visually, it still disturbs me that outputing a string is the only way I can get the 0 to stick. If I cast it as an int, or try to return 0.0, or do anything that's numeric at all, False is always the result.

Why is this, and how do I get the pivot table to actually give me 0 in this case (by only modifying the aggfunc, not the dataframe itself)?

Elliptica
  • 3,928
  • 3
  • 37
  • 68

1 Answers1

3
df = pd.DataFrame({'count': [False] * 12, 'index': [0, 1] * 6, 'cols': ['a', 'b', 'c'] * 4})
print(df)

outputs

   cols  count  index
0     a  False      0
1     b  False      1
2     c  False      0
3     a  False      1
4     b  False      0
5     c  False      1
6     a  False      0
7     b  False      1
8     c  False      0
9     a  False      1
10    b  False      0
11    c  False      1

You can use astype (docs) to cast to int before pivoting.

res = df.pivot_table(values='count', aggfunc=np.sum, columns='cols', index='index').astype(int)
print(res)

outputs

cols   a  b  c
index         
0      0  0  0
1      0  0  0
Alex
  • 18,484
  • 8
  • 60
  • 80
  • Thank you, that does work. However, my program can't change the dataframe (for reasons I won't go into here). Suffice to say, my solution has to rely on aggfunc only. Do you know a way to do it without changing the dataframe? – Elliptica Aug 04 '16 at 18:21
  • @Elliptica Sure, just move the `astype` call to the result. Answer updated. – Alex Aug 04 '16 at 18:26