0

I need to create a fast python implementation of a simple wrapper for a scipy.stats function. This function only take vectors and not matrices. Wrappers below are two implementations, but both have similar run times. Is it possible to speed any of these up, without moving the implementation to C/C++ domain.

import numpy as np
from scipy.stats import kendalltau

def wrap1(X, y):
    corr = []
    p_value = []
    X = np.array(X).transpose()
    y = np.ravel(y)
    for col in X:
        ktau = kendalltau(col, y, nan_policy='raise')
        corr.append(ktau[0])
        p_value.append(ktau[1])
    return corr, p_value

##########Version2

def wrap2(X, y):
        X = np.array(X).transpose()
        y = np.tile(np.ravel(y), (X.shape[0], 1))
        corr, p_value = zip(*[kendalltau(a, b, nan_policy='raise')
                              for a, b in zip(X, y)])
        return corr, p_value

Sample run:

t1 = np.arange(30).reshape(10,3)
t2 = np.arange(10).reshape(10,)
wrap1(t1,t2)
wrap2(t1,t2)

Thanks a lot

sophros
  • 14,672
  • 11
  • 46
  • 75
agarg
  • 318
  • 3
  • 11

1 Answers1

0

1) In your wrap1 function, preallocate the arrays of corr and p_value, and fill them in instead of appending to lists.

2) replace np.array(X) with np.asarray(X) --- this will avoid making a copy of X if it's already an array.

That's probably all you can do easily if staying at the python level.

If that's not enough, you can try profiling the kendalltau function. There is quite a bit going on, and if you see a significant fraction of time spent in e.g. checking your arrays for nan values and you are sure your inputs do not have those, you can copy-paste relevant parts of the scipy implementation into your code.

ev-br
  • 24,968
  • 9
  • 65
  • 78