Simple set-up: I have a list (roughly 40,000 entries) containing lists of strings (each with 2-15 elements). I want to compare all of the sublists to check if they have a common element (they share at most one). At the end, I want to create a dictionary (graph if you wish) where the index of each sublist is used as a key, and its values are the indices of the other sublists with which it shares common elements.
For example
lst = [['dam', 'aam','adm', 'ada', 'adam'], ['va','ea','ev','eva'], ['va','aa','av','ava']]
should give the following:
dic = {0: [], 1: [2], 2: [1]}
My problem is that I found a solution, but it's very computationally expensive. First, I wrote a function to compute the intersection of two lists:
def intersection(lst1, lst2):
temp = set(lst2)
lst3 = [value for value in lst1 if value in temp]
return lst3
Then I would loop over all the lists to check for intersections:
dic = {}
iter_range = range(len(lst))
#loop over all lists where k != i
for i in iter_range:
#create range that doesn't contain i
new_range = list(iter_range)
new_range.remove(i)
lst = []
for k in new_range:
#check if the lists at position i and k intersect
if len(intersection(mod_names[i], mod_names[k])) > 0:
lst.append(k)
# fill dictionary
dic[i] = lst
I know that for loops are slow, and that I'm looping over the list unnecessarily often (in the above example, I compare 1 with 2, then 2 with 1), but I don't know how to change it to make the program run faster.