I suspect there is probably a SO post that has answered this question, but I have not been able to find it yet, so I apologize in advance if this is a duplicate question.
I'm attempting to implement a radial basis function kernel from scratch using Numpy for my own learning purposes. For one-dimensional inputs, the calculation is pretty easy:
def kernel(x, y):
return * np.exp( -0.5 * np.subtract.outer(x, y)**2)
The above is from a blog post on Gaussian Processes.
But I'm trying to extend this to multiple dimensions. I have an implementation that works fine below:
x = np.array([[4,3,5], [1,3,9], [0,1,0], [4,3,5]])
distances = []
γ = -.5
for i in x:
for j in x:
distances.append(np.exp(γ * np.linalg.norm(i - j) ** 2))
np.array(distances).reshape(len(x),len(x))
[[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]
[3.72665317e-06 1.00000000e+00 2.11513104e-19 3.72665317e-06]
[1.69189792e-10 2.11513104e-19 1.00000000e+00 1.69189792e-10]
[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]]
I am checking using sklearn.pairwise.rbf_kernel
from sklearn.metrics.pairwise import rbf_kernel
print(rbf_kernel(x, gamma= .5))
[[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]
[3.72665317e-06 1.00000000e+00 2.11513104e-19 3.72665317e-06]
[1.69189792e-10 2.11513104e-19 1.00000000e+00 1.69189792e-10]
[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]]
But clearly the double for loops are not the most efficient way of iterating through this. What's the best way to vectorize this operation?
This SO post provides an efficient way of calculating distances, but does not provide the vectorization I need.