Check all rows in a Pandas group are identical

Question

I have a MultiIndexed data frame that I am breaking up on one of the indices. That data looks something like this:

some_vals = [['blue', 'green']] * 3 + [['orange', 'yellow']] * 2 + [['violet', 'fuligin']] * 5

some_index = pd.MultiIndex.from_tuples([('foo', 1)] * 3 + [('bar', 4)] * 2 + [('baz', 7)] * 5,
                                       names=('wibble', 'wobble'))

some_data = pd.DataFrame(some_vals, index=some_index, columns=('quality', 'aspect'))

In my actual data, as in some_data, all the rows for a given wibble should be identical. In order to verify this programmatically, I do this:

grouped.apply(lambda g: g.value_counts().count() == 1)

# wibble
# bar    True
# baz    True
# foo    True
# dtype: bool

This works ok, but it seems pretty slow when I blow it out to my actual data set, taking several seconds for about 5000 rows. This seems pretty slow, making me think there must be a more efficient (and possibly natural) way to accomplish this.

Using nunique doesn't quite get me what I want, which is a shame because it's somewhat faster:

grouped.apply(lambda g: (g.nunique() == 1)).all()
# quality    True
# aspect     True
# dtype: bool

I want to catch groups where the rows do not agree, but this does everything column-wise.

Does this answer your question? [Check if all elements in a group are equal using pandas GroupBy](https://stackoverflow.com/questions/53950883/check-if-all-elements-in-a-group-are-equal-using-pandas-groupby) — Stef, Aug 30 '21 at 16:50
@Stef that doesn't quite do what I want, because it treats each column separately. I've edited my question to clarify. — Pillsy, Aug 30 '21 at 17:25

score 0 · Answer 1 · answered Aug 31 '21 at 13:56

0

A little work led me to this solution, which I would like to record for posterity:

def all_same(df) -> bool:
    return (df.nunique() == 1).all()

grouped.apply(all_same)
# wibble
# bar    True
# baz    True
# foo    True
# dtype: bool

The performance is about 40% better, but still takes on the order of 1 second for 5000 rows on my admittedly anemic laptop.

answered Aug 31 '21 at 13:56

Pillsy

9,781
1
43
70

I think you can use `.nunique().eq(1)` instead of `.apply(all_same)` – Stef Aug 31 '21 at 13:59

Check all rows in a Pandas group are identical

1 Answers1