Vectorizing radial basis function's euclidean distance calculation for multidimensional features

Question

I suspect there is probably a SO post that has answered this question, but I have not been able to find it yet, so I apologize in advance if this is a duplicate question.

I'm attempting to implement a radial basis function kernel from scratch using Numpy for my own learning purposes. For one-dimensional inputs, the calculation is pretty easy:

def kernel(x, y):
    return * np.exp( -0.5 * np.subtract.outer(x, y)**2)

The above is from a blog post on Gaussian Processes.

But I'm trying to extend this to multiple dimensions. I have an implementation that works fine below:

x = np.array([[4,3,5], [1,3,9], [0,1,0], [4,3,5]])
distances = []
γ = -.5
for i in x:
    for j in x:
        distances.append(np.exp(γ * np.linalg.norm(i - j) ** 2))
np.array(distances).reshape(len(x),len(x))

[[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]
 [3.72665317e-06 1.00000000e+00 2.11513104e-19 3.72665317e-06]
 [1.69189792e-10 2.11513104e-19 1.00000000e+00 1.69189792e-10]
 [1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]]

I am checking using sklearn.pairwise.rbf_kernel

from sklearn.metrics.pairwise import rbf_kernel
print(rbf_kernel(x, gamma= .5))

[[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]
 [3.72665317e-06 1.00000000e+00 2.11513104e-19 3.72665317e-06]
 [1.69189792e-10 2.11513104e-19 1.00000000e+00 1.69189792e-10]
 [1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]]

But clearly the double for loops are not the most efficient way of iterating through this. What's the best way to vectorize this operation?

This SO post provides an efficient way of calculating distances, but does not provide the vectorization I need.

Divakar · Answer 1 · 2018-06-17T22:22:03.513

We can use SciPy's cdist and then scale those with the exponential values -

from scipy.spatial.distance import cdist

lam = -.5
out = np.exp(lam* cdist(x,x,'sqeuclidean'))

We can also leverage matrix-mutliplication -

def sqcdist_own(x):
    row_sum = (x**2).sum(1) # or np.einsum('ij,ij->i',x,x)
    sqeucdist = row_sum - 2*x.dot(x.T)
    sqeucdist += row_sum[:,None]
    return sqeucdist

out = np.exp(lam* cdist(x,x,'sqeuclidean'))

To use these approaches on both 2D and 1D cases, reshape x to 2D as a pre-processing step : X = x.reshape(len(x),-1) and then use X instead as the input into these solutions.

score 1 · Accepted Answer · answered Jun 17 '18 at 22:17

You can utilize the following observation to solve the problem:

||a - b|| ** 2 = ||a|| ** 2 + ||b|| ** 2 - 2 * <a, b>

In code, it will look as following:

x_norm = np.linalg.norm(x, axis=1) ** 2
output = np.exp(- 0.5 * (x_norm.reshape(-1, 1) + x_norm.reshape(1, -1) - 2 * np.dot(x, x.T)))

Vectorizing radial basis function's euclidean distance calculation for multidimensional features

2 Answers2