Say I have 10 unordered lists of 100 string elements in each. What's the fastest way to find which lists have a high degree of overlap (e.g. 50%+) with another list or lists, and which list(s) they overlap with?
What would if we scaled it up to 1,000,000,000 unordered lists of 10,000 strings each? What's the most efficient way to identify these lists?