0

I have a data frame that is purely boolean. I have two columns IsInTime and IsGoodKine.

      IsInTime      IsGoodKine
0         True            True
1         True            True
2         True            True
3         True           False
4         True           False
5         True            True
6         True            True
7         True            True
8         True            True
9         True            True
...

When I called df.dtypes I get "bool" for every column. However, when I call df[IsInTime].value_counts() I get a series back that does not have two entries: I get this,

True     3846372
False     172188
True           6

Why is Pandas splitting in this instance True into two separate value counts? How can I stop this.

I have inspected the data types of the data frame and event tried casting the column as boolean. It still doesn't fix the issue. I.e.,

df["IsInTime"].astype("bool").value_counts() # Doesn't work

Edit: The below also doesn't work for a similar reason,

df.groupby(['IsGoodKine', 'IsInTime']).size()

it returns,

        IsGoodKine         IsInTime
False        False             7633
             True            164555
True         True                 6
             False           133369
             True           3713003
dtype: int64

Note: I cannot simply "add" the two values together because what I actually want to do is get the number of entires across the two columns with each logical combination (true, true), (true, false), (false, true) and (false, false). But when I call size() or value_counts() I am getting more than 4 logical combinations! Which can't be correct.

  • Do you need `df.groupby(['IsInTime', 'IsGoodKine']).size()` ? – jezrael Dec 14 '22 at 09:58
  • Essentially yes, but I get the same issue. If I run that I get this as output: IsInTime IsGoodKine False False 7633 True 164555 True True 6 False 133369 True 3713003 dtype: int64 – Chandler Kenworthy Dec 14 '22 at 10:01
  • What is expected ouput from sample data? – jezrael Dec 14 '22 at 10:02
  • Basically the result of df.groupby(['IsGoodKine', 'IsInTime']).size().reset_index() but without the extra weird combinations. It should have 4 entries (true, true)=8, (true, false)=2 (false, false)=0 and (false, true)=0 for the sample data given. – Chandler Kenworthy Dec 14 '22 at 10:07
  • So need first `df[['IsGoodKine', 'IsInTime']] = df[['IsGoodKine', 'IsInTime']].astype(bool)` and then `df.groupby(['IsInTime', 'IsGoodKine']).size()` ? – jezrael Dec 14 '22 at 10:08
  • I just tried this but I get the same result as above. Basically I still have two counts for `true & true` one says 6 and another 3713003. – Chandler Kenworthy Dec 14 '22 at 10:11
  • What is `print (df.groupby(['IsGoodKine', 'IsInTime']).size().index.tolist())` ? – jezrael Dec 14 '22 at 10:14
  • [(False, False), (False, True), (True, True), (True, False), (True, True)] – Chandler Kenworthy Dec 14 '22 at 10:21
  • 1
    No idea why is duplicated `True, True`. One idea, can you try `df.groupby(['IsGoodKine', 'IsInTime']).size().groupby(level=[0,1]).sum()` ? – jezrael Dec 14 '22 at 10:29
  • This worked! `IsGoodKine IsInTime False False 1 True 46299 True False 532 True 9988162` Could you explain why this worked? – Chandler Kenworthy Dec 14 '22 at 10:31

0 Answers0