Given a large set of sets, all the same size (call it S={s1,...,sn}), I want to find all pairs (si,sj) that have an overlap of at least M.
So if M=2 and S consists of
- s1 = (3,4,8,9)
- s2 = (1,3,7,8)
- s3 = (1,2,5,6)
- s4 = (1,6,7,8)
I want to identify the pairs (s1,s2), (s2,s4), and (s3,s4).
The straightforward approach compares every pair and checks for the size of the intersection, but this is prohibitively slow given the number of sets and size of sets I am using (something like O(log(m) n2) where m is the size of the sets?).
I've searched around and haven't found a similar question (though this answer is probably relevant). Any help would be greatly appreciated!