how can i group values from an array with fuzzy logic matching 80%
combined_list = ['magic', 'simple power', 'matrix', 'simple aa', 'madness', 'magics', 'mgcsa', 'simple pws', 'seek', 'dour', 'softy']
yields:
['magic, magics'], ['simple pws', 'simple aa'], ['simple power'], [matrix]
here is what i have achieved but the is very different from my goal. In addition it only supports few values, what i plan to do is run it with like 50,000 records
from difflib import SequenceMatcher as sm
combined_list = ['magic', 'simple power', 'matrix', 'madness', 'magics', 'mgcsa', 'simple pws', 'seek', 'sour', 'soft']
result = list()
result_group = list()
for x in combined_list:
for name in combined_list:
if(sm(None, x, name).ratio() >= 0.80):
result_group.append(name)
else:
pass
result.append(result_group)
print(result)
del result_group[:]
print(result)
the print result outside the loop is empty, but the result inside the loop contains the values i need. although the output is different from what i need
['magic', 'magics']]
[['simple power', 'simple pws'], ['simple power', 'simple pws']]
[['matrix'], ['matrix'], ['matrix']]
[['madness'], ['madness'], ['madness'], ['madness']]
[['magic', 'magics'], ['magic', 'magics'], ['magic', 'magics'], ['magic', 'magics'], ['magic', 'magics']]
[['mgcsa'], ['mgcsa'], ['mgcsa'], ['mgcsa'], ['mgcsa'], ['mgcsa']]
[['simple power', 'simple pws'], ['simple power', 'simple pws'], ['simple power', 'simple pws'], ['simple power', 'simple pws'], ['simple power', 'simple pws'], ['simple power', 'simple pws'], ['simple power', 'simple pws']]
[['seek'], ['seek'], ['seek'], ['seek'], ['seek'], ['seek'], ['seek'], ['seek']]
[['sour'], ['sour'], ['sour'], ['sour'], ['sour'], ['sour'], ['sour'], ['sour'], ['sour']]
[['soft'], ['soft'], ['soft'], ['soft'], ['soft'], ['soft'], ['soft'], ['soft'], ['soft'], ['soft']]
[['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa'], ['simple aa']]
[[], [], [], [], [], [], [], [], [], [], []]