26

Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value?

Sample data:

              datetime rating  signal
0  2018-12-27 11:33:00     IG       0
1  2018-12-27 11:33:00     HY      -1
2  2018-12-27 11:49:00     IG       0
3  2018-12-27 11:49:00     HY      -1
4  2018-12-27 12:00:00     IG       0
5  2018-12-27 12:00:00     HY      -1
6  2018-12-27 12:49:00     IG       0
7  2018-12-27 12:49:00     HY      -1
8  2018-12-27 14:56:00     IG       0
9  2018-12-27 14:56:00     HY      -1
10 2018-12-27 15:12:00     IG       0
11 2018-12-27 15:12:00     HY      -1
12 2018-12-20 15:14:00     IG       0
13 2018-12-20 15:14:00     HY      -1
14 2018-12-20 15:50:00     IG      -1
15 2018-12-20 15:50:00     HY      -1
16 2018-12-27 13:26:00     IG       0
17 2018-12-27 13:26:00     HY      -1
18 2018-12-27 13:44:00     IG       0
19 2018-12-27 13:44:00     HY      -1
20 2018-12-27 15:06:00     IG       0
21 2018-12-27 15:06:00     HY      -1
22 2018-12-20 15:48:00     IG       0
23 2018-12-20 15:48:00     HY      -1

The grouping part can be done by

df.groupby([datetime.dt.date,'rating'])

However, I'm sure there must be a simple way to leverage the grouper and use a transform statement to return 1 if all the values from signal are the same.

Desired output

2018-12-20  HY            True
            IG            False
2018-12-27  HY            True
            IG            True
cs95
  • 379,657
  • 97
  • 704
  • 746
Yuca
  • 6,010
  • 3
  • 22
  • 42

2 Answers2

35

Use groupby and nunique, and check whether the result is 1:

df.groupby([df.datetime.dt.date, 'rating']).signal.nunique().eq(1)

datetime    rating
2018-12-20  HY         True
            IG        False
2018-12-27  HY         True
            IG         True
Name: signal, dtype: bool

Or, similarly, using apply with set conversion:

(df.groupby([df.datetime.dt.date, 'rating']).signal
   .apply(lambda x: len(set(x)) == 1))

datetime    rating
2018-12-20  HY         True
            IG        False
2018-12-27  HY         True
            IG         True
Name: signal, dtype: bool

PS., you don't need to assign a temp column, groupby takes arbitrary grouper arguments.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • follow up: is there an easy way to recover the index of the 'odd-one-out'? we can assume there's only one per day – Yuca Dec 28 '18 at 16:26
  • @Yuca do you mean per day there would only be one odd one out (either true or false)? Is it possible there can be no odd one out? – cs95 Dec 28 '18 at 17:00
  • yes, it is possible that there is no odd one out, I have a pretty ugly way of doing it that involves using grouby twice :S – Yuca Dec 28 '18 at 17:02
5

Try to find out alternative without using groupby just for fun

df.datetime=df.datetime.dt.date

s=pd.crosstab(df.datetime,[df.rating,df.signal])


s.eq(s.sum(axis=1,level=0),1).any(level=0,axis=1).stack()
Out[556]: 
datetime    rating
2018-12-20  HY         True
            IG        False
2018-12-27  HY         True
            IG         True
dtype: bool
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I have a challenge for you, make it `false` only if it's not the last entry of the day :) – Yuca Dec 27 '18 at 21:48