I am trying to reduce the computation time of my script,which is run with pypy. It has to calculate for a large number of lists/vectors/arrays the pairwise sums of absolute differences. The length of the input vectors is quite small, between 10 and 500. I tested three different approaches so far:
1) Naive approach, input as lists:
def std_sum(v1, v2):
distance = 0.0
for (a,b) in izip(v1, v2):
distance += math.fabs(a-b)
return distance
2) With lambdas and reduce, input as lists:
lzi = lambda v1, v2: reduce(lambda s, (a,b):s + math.fabs(a-b), izip(v1, v2), 0)
def lmd_sum(v1, v2):
return lzi(v1, v2)
3) Using numpy, input as numpy.arrays:
def np_sum(v1, v2):
return np.sum(np.abs(v1-v2))
On my machine, using pypy and pairs from itertools.combinations_with_replacement of 500 such lists, the first two approaches are very similar (roughly 5 seconds), while the numpy approach is significantly slower, taking around 12 seconds.
Is there a faster way to do the calculations? The lists are read and parsed from text files and an increased preprocessing time would be no problem (such as creating numpy arrays). The lists contain floating point numbers and are of equal size which is known beforehand.
The script I use for ''benchmarking'' can be found here and some example data here.