Create a Boolean Series with your condition then groupby
+ transform('any')
to form a mask for the original DataFrame. This allows you to simply slice the original DataFrame.
df[~df.Count.lt(8).groupby(df.Groups).transform('any')]
# Groups Count
#3 2 12
#4 2 15
#5 2 21
While the syntax of groupby
+ filter
is more straightforward, it performs much worse for a large number of groups, so creating the Boolean mask with transform
is preferred. In this example there's over a 1000x improvement. The .isin
method works extremely fast for a single column but would require switching to a merge if grouping on multiple columns.
import pandas as pd
import numpy as np
np.random.seed(123)
N = 50000
df = pd.DataFrame({'Groups': [*range(N//2)]*2,
'Count': np.random.randint(0, 1000, N)})
# Double check both are equivalent
(df.groupby('Groups').filter(lambda x: x['Count'].min() >= 8)
== df[~df.Count.lt(8).groupby(df.Groups).transform('any')]).all().all()
#True
%timeit df.groupby('Groups').filter(lambda x: x['Count'].min() >= 8)
#8.15 s ± 80.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df[~df.Count.lt(8).groupby(df.Groups).transform('any')]
#6.54 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df[~df['Groups'].isin(df.loc[df['Count'] < 8, 'Groups'].unique())]
#2.88 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)