how to speed up kernelize perceptron using parallelization?

Question

I am dealing with some kind of huge data set where I need to do binary classification using kernelized perceptron. I am using this source code: https://gist.github.com/mblondel/656147 .

Here there 3 things that can be paralelized , 1)kernel computation, 2)update rule 3)projection part. Also I did some kind of other speed up like calculation upper triangulated part of kernel then making it to full symmetric matrix :

 K = np.zeros((n_samples, n_samples))
        for index in itertools.combinations_with_replacement(range(n_samples),2):
            K[index] = self.kernel(X[ index[0] ],X[ index[1] ],self.gamma)      
        #make the full KERNEL
        K = K + np.triu(K,1).T

I also paralelized the projection part like:

   def parallel_project(self,X):
        """ Function to parallelizing prediction"""    
        y_predict=np.zeros(self.nOfWorkers,"object")

        pool=mp.Pool(processes=self.nOfWorkers)
        results=[pool.apply_async(prediction_worker,args=(self.alpha,self.sv_y,self.sv,self.kernel,(parts,))) for parts in np.array_split(X,self.nOfWorkers)]
        pool.close()
        pool.join()
        i=0

        for r in results:
            y_predict[i]=r.get()
            i+=1
        return np.hstack(y_predict)

and worker:

def prediction_worker(alpha,sv_y,sv,kernel,samples):
    """ WORKER FOR PARALELIZING PREDICTION PART"""
    print "starting:" , mp.current_process().name
    X= samples[0]
    y_predict=np.zeros(len(X))
    for i in range(len(X)):
        s = 0
        for a1, sv_y1, sv1 in zip(alpha, sv_y, sv):
            s += a1 * sv_y1 * kernel(X[i], sv1)
        y_predict[i]=s  
    return y_predict.flatten()

but still the code is too slow. So can you give me any hint regarding paralelization or any other speed up?

remark: please prove general solution,I am not dealing with customize kernel functions.

thanks

Don't parallelize, [Cythonize](http://cython.org). Effective parallel programming in Python is hard because the single-threaded interpreter comes with a lot of overhead. Or use scikit-learn, which has SVMs and soon hopefully a kernel ridge classifier by the author of that very piece of code. — Fred Foo, Jul 18 '13 at 09:57
did you check where the bottlenecks are? (and how many threads etc you make? since if you parallellize into too many threads, the overhead just dominates if each thread is too small a task) and zip is slow, use numpy.vstack instead — usethedeathstar, Jul 18 '13 at 09:58
@larsmans the problem is I am not a C expert and writing this code in C would be too much headache for me. — Moj, Jul 18 '13 at 10:00
@usethedeathstar Actually I am just using `multiprocessing` module and not dealing with threads my self. I am running the on 8 core cpu and usually I use 7 process for running the code. Do you it's better to use `multithreading` module? — Moj, Jul 18 '13 at 10:02
also: avoid the for-loops, vectorize everything, (that way the for-loops are in numpy, which in essence is C, which does much faster for-loops than if you use for-loops in python itself). There are also some existing tools for profiling python code. — usethedeathstar, Jul 18 '13 at 11:01

Fred Foo · Answer 1 · 2013-07-18T10:14:09.563

1

Here's something that should give you an instant speedup. The kernels in Mathieu's example code take single samples, but then full Gram matrices are computed using them:

K = np.zeros((n_samples, n_samples))
for i in range(n_samples):
    for j in range(n_samples):
        K[i,j] = self.kernel(X[i], X[j])

This is slow, and can be avoided by vectorizing the kernel functions:

def linear_kernel(X, Y):
    return np.dot(X, Y.T)
def polynomial_kernel(X, Y, p=3):
    return (1 + np.dot(X, Y.T)) ** p
# the Gaussian RBF kernel is a bit trickier

Now the Gram matrix can be computed as just

K = kernel(X, X)

The project function should be changed accordingly to speed that up as well.

edited Jul 18 '13 at 10:14

answered Jul 18 '13 at 10:08

Fred Foo

355,277
75
744
836

Thats True, but it not working in my case. I am using some kind of unusual kernel functions to deal with structured data. this is my kernel http://stackoverflow.com/questions/15479777/efficient-algorithm-instead-of-looping – Moj Jul 18 '13 at 10:15
@Moj: then still, you should try to exploit vectorized operations in the kernel function, or switch the SVMs which should require fewer kernel function invocations. Have you profiled to see where the program is spending its time? – Fred Foo Jul 18 '13 at 10:38
I agree about the vectorization but I couldn't figured it how. I would be interested if some one here can give me a hint. it spend alot of time on update rule and projection. kernel is bit faster than those part. – Moj Jul 18 '13 at 10:52
1

@Moj: I haven't looked at your kernel in detail, but is there a chance that you can set up your structured objects as `scipy.sparse` matrices and use `sparse.linalg` computations? – Fred Foo Jul 18 '13 at 11:35

how to speed up kernelize perceptron using parallelization?

1 Answers1