If you only have strings, why not use a set?
Target = set(Target_Column.tolist())
You can also use a default value of a set for your mapping:
clusters = defaultdict(set)
But this requires changing list.append
to set.add
in your loop.
There is, however, a more pythonic alternative to your code.
I would probably generate a mapping from words to the set of their connections on the fly.
Here is an example assuming words
is a set
of all words:
clusters = {w1: set(w2 for w2 in words if distance(w1, w2) <= threshold) for w1 in words}
Live example:
>>> distance = lambda x, y: abs(len(x) - len(y))
>>> words = set("abc def abcd abcdefghijk abcdefghijklmnopqrstuv".split())
>>> threshold = 3
>>> for cluster, values in clusters.items():
... print cluster, ": ", ", ".join(values)
...
abcd : abcd, abc, def
abc : abcd, abc, def
abcdefghijk : abcdefghijk
abcdefghijklmnopqrstuv : abcdefghijklmnopqrstuv
def : abcd, abc, def
Increasing threshold we get one big "cluster" for all words:
>>> threshold = 100
>>> clusters = {w1: set(w2 for w2 in words if distance(w1, w2) <= threshold) for w1 in words}
>>> for cluster, values in clusters.items():
... print cluster, ": ", ", ".join(values)
...
abcd : abcd, abc, abcdefghijk, abcdefghijklmnopqrstuv, def
abc : abcd, abc, abcdefghijk, abcdefghijklmnopqrstuv, def
abcdefghijk : abcd, abc, abcdefghijk, abcdefghijklmnopqrstuv, def
abcdefghijklmnopqrstuv : abcd, abc, abcdefghijk, abcdefghijklmnopqrstuv, def
def : abcd, abc, abcdefghijk, abcdefghijklmnopqrstuv, def
from Levenshtein import distance – Ajay Jadhav May 24 '16 at 07:08