You can also call value_counts()
here to get size of groups.
df = pd.DataFrame({'L1': list('ZXYXYXY'), 'L2': [1, 0, 1, 0, 0, 0, 1]})
L1 L2
0 Z 1
1 X 0
2 Y 1
3 X 0
4 Y 0
5 X 0
6 Y 1
The basic idea is to get the size of each group and filter the groupers (grp
below) that correspond are at least size 3.
grp = 'L1'
size = df.value_counts(grp)
size.index[size>=3] # Index(['X', 'Y'], dtype='object', name='L1')
If we want to use the group_ids
, then numpy.unique()
could be useful. The basic idea is to count the unique grouper ids and filter the ones that have at least 3 values. This will give the grouper ids that are at least of size 3.
If we want to look at the group keys that correspond to these values, we can use these indices to filter the group_keys_seq
attribute, which is equal to the index filtered using value_counts()
above.1
grp = 'L1'
g = df.groupby(grp).grouper
# count unique grouper ids
u, c = np.unique(g.group_info[0], return_counts=True)
idx = u[c >= 3] # array([0, 1], dtype=int64)
g.group_keys_seq[idx] # Index(['X', 'Y'], dtype='object', name='L1')
1 If the groupby
is done on multiple columns, then group_keys_seq
returns a list of tuples and it can't be indexed like g.group_keys_seq[idx]
. In that case, use pd.MultiIndex.from_tuples(g.group_keys_seq)[idx]
instead.