Assume I have two very large dictionaries: bigDictA and bigDictB, so something like below.
bigDictA = {frozenset("a1","b2","c3"): [some floats in a list], ...}
bigDictB = {frozenset("a1","b2"): [some floats in a list], ...}
Now, the algorithm I wrote which I need help optimizing looks like this:
setOfMatches = set()
for bigDictAkey, bigDictAval in bigDictA.items():
for bigDictBkey, bigDictBval in bigDictB.items():
itemsInCommon = list(frozenset.intersection(bigDictAkey,bigDictBkey))
numberOfItemsInCommon = len(itemsInCommon)
valForComparison = THRESHOLD*float(len(list(bigDictAkey)))
if (numberOfItemsInCommon >= valForComparison):
setOfMatches.add(bigDictAkey)
So if THRESHOLD was 0.3 frozenset("a1","b2","c3") would be added to setOfMatches, but if THRESHOLD was 0.7 it wouldn't be added.
I realize this isn't efficient, but I am definitely open to any suggestions including converting the key datatype to some other datatype for a speed up. I have also looked into using tools like Numba and Cython as well (I prefer to keep it in pure python though). It needs to be crazy fast!
Any help is really appreciated! Thanks!