I have a MultiIndex
ed data frame that I am breaking up on one of the indices. That data looks something like this:
some_vals = [['blue', 'green']] * 3 + [['orange', 'yellow']] * 2 + [['violet', 'fuligin']] * 5
some_index = pd.MultiIndex.from_tuples([('foo', 1)] * 3 + [('bar', 4)] * 2 + [('baz', 7)] * 5,
names=('wibble', 'wobble'))
some_data = pd.DataFrame(some_vals, index=some_index, columns=('quality', 'aspect'))
In my actual data, as in some_data
, all the rows for a given wibble
should be identical. In order to verify this programmatically, I do this:
grouped.apply(lambda g: g.value_counts().count() == 1)
# wibble
# bar True
# baz True
# foo True
# dtype: bool
This works ok, but it seems pretty slow when I blow it out to my actual data set, taking several seconds for about 5000 rows. This seems pretty slow, making me think there must be a more efficient (and possibly natural) way to accomplish this.
Using nunique
doesn't quite get me what I want, which is a shame because it's somewhat faster:
grouped.apply(lambda g: (g.nunique() == 1)).all()
# quality True
# aspect True
# dtype: bool
I want to catch groups where the rows do not agree, but this does everything column-wise.