Using reduce, map or other function to avoid for loops in python

Question

I have a program working for calculating the distance and then apply the k-means algorithm. I tested on a small list and it's working fine and fast, however, my original list is very big (>5000), so it's taking forever and I ended it up terminating the running. Can I use outer() or any other parallel function and apply it to the distance function to make this faster?? On the small set that I have:

strings = ['cosine cos', 'cosine', 'cosine???????', 'l1', 'l2', 'manhattan']

And its distance 3D array returns like this:

[[[ 0.          0.25        0.47826087  1.          1.          0.89473684]
  [ 0.25        0.          0.36842105  1.          1.          0.86666667]
  [ 0.47826087  0.36842105  0.          1.          1.          0.90909091]
  [ 1.          1.          1.          0.          0.5         1.        ]
  [ 1.          1.          1.          0.5         0.          1.        ]
  [ 0.89473684  0.86666667  0.90909091  1.          1.          0.        ]]]

Each line of the array above represents the distance for one item in the strings list. My way of doing it using the for loops is:

strings = ['cosine cos', 'cosine', 'cosine???????', 'l1', 'l2', 'manhattan']


data1 = []


for j in range(len(np.array(list(strings)))):

     for i in range(len(strings)):
       data1.append(1-Levenshtein.ratio(np.array(list(strings))[j], np.array(list(strings))[i]))

#n =(map(Levenshtein.ratio, strings))
#n =(reduce(Levenshtein.ratio, strings))
#print(n)



k=len(strings)
data2=np.asarray(data1)
arr_3d = data2.reshape((1,k,k))
print(arr_3d)

Where arr_3d is the array above. How can I use any of outer() or map() to replace the for loops above, because when the list strings is big, it's taking hours and never got the results even. I appreciate the help. Levenshtein.ratio is a built in funciton in python.

`reduce` and `map` won't make this any faster. Why are you doing `np.array(list(strings))[j]` instead of just `strings[j]`? — user2357112, May 26 '16 at 20:04
Also, `Levenshtein.ratio` is not a thing that comes with Python. Where is this function coming from? — user2357112, May 26 '16 at 20:05
It's an older trial to make my last error works, that is not necessary, it can be strings[j].. but what would make it faster then?? — Lelo, May 26 '16 at 20:06
it comes from the package called "Levenshtein", so I should have import Levenshtein at the very beginning — Lelo, May 26 '16 at 20:07
using `map` does not mean the loop disappears. it just means it is not in your code. There is no magic trick here. — njzk2, May 26 '16 at 20:09
what about reduce()??? I used this function in R and it makes things faster instead of for loops, but I don't know how to use it with python, Any ideas? — Lelo, May 26 '16 at 20:10
`reduce` won't help you either. This isn't even a reduction operation. The best you can do without switching technologies is to take out those unnecessary, hideously expensive `np.array(list(strings))`. You might be able to do somewhat better with Cython or C. — user2357112, May 26 '16 at 20:14
Note sure if that's the issue, I 'm facing slowness even before putting those sentences. I'm restricted on using python — Lelo, May 26 '16 at 20:18

score 0 · Accepted Answer · answered May 26 '16 at 22:58

import numpy as np 

strings = ['cosine cos', 'cosine', 'cosine???????', 'l1', 'l2', 'manhattan']

k=len(strings)

data = np.zeros((k,k))

for i,string1 in enumerate(strings):
    for j,string2 in enumerate(strings):
        data[i][j] = 1-Levenshtein.ratio(string1, string2)

print data

No gains to be had with map or reduce here, the loops need to be run as @user2357112 mentions, however, this is cleaner and should run faster since it avoids the np.array(list(strings)) you were using throughout.

Thanks. That made it faster. still slow though when the clustering operation comes. — Lelo, May 27 '16 at 19:02

Using reduce, map or other function to avoid for loops in python

1 Answers1