I would like to identify the similarity between two lists after that I want to do clustering of descriptions.
L2D1 L2D2 L2D2 .........L2Dn
L1D1 0 0.3 0.8............0.5
L1D2 0.2 0.7 0.3............0.2
L1D3 0 0.3 0.8............0.5
. . . . .
. . . . .
. . . . .
L1Dn 0.6 0.1 0.9............0.4
from Levenshtein import distance
List1 = list(new['Description'])
List2 = list(clean['Description'])
Matrix = np.zeros((len(List1),len(List2)),dtype=np.int)
for i in range(0,len(List1)):
for j in range(0,len(List2)):
Matrix[i,j] = distance(List1[i],List2[j])
Since the above method is time consuming as size and length of data.
I tried to compare first five words of description if it matches only then calculate the distance between two string, else move to next description of the list in method2.
#Method2
for i in range(0,len(List1)):
K1[i]=str(List1[:1]).split()[0:5]
for j in range(0,len(List2)):
K1[i]=str(List2[:1]).split()[0:5]
if (distance(K1[i],K2[j]))==0:
Matrix[i,j]=distance(List1[i],List2[j])
else:
Matrix[i,j]=1000
But as I am new to this missing some logic and getting:
TypeError: 'int' object does not support item assignment
I also want to implement same for next 10 and 100 words. Thanks in advance.