I am trying to find subsets (of any lengths) of attribute (column) values, which are unique in a given dataset. The most efficient way to the best of my knowledge to find those is by computing multiple (many) groupby activities counting the corresponding group sizes in pandas. As the loop can become pretty large, what is the most efficient way to speed up those many group by tasks on the same dataset?
groups = [["a","b"],["a","b","c"],["c","a"]] # this can be really large
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])
for g in groups:
r = df.groupby(g, sort=False).size().reset_index().rename(columns={0:'count'})
if r.loc[r['count']==1]['count'].count() > 0:
# do something