I am using sklearn.svm.SVR
for a "regression task" which I want to use my "customized kernel method". Here is the dataset samples and the code:
index density speed label
0 14 58.844020 77.179139
1 29 67.624946 78.367394
2 44 77.679100 79.143744
3 59 79.361877 70.048869
4 74 72.529289 74.499239
.... and so on
from sklearn import svm
import pandas as pd
import numpy as np
density = np.random.randint(0,100, size=(3000, 1))
speed = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))
label = np.random.randint(20,80, size=(3000, 1)) + np.random.random(size=(3000, 1))
d = np.hstack((a,b,c))
data = pd.DataFrame(d, columns=['density', 'speed', 'label'])
data.density = data.density.astype(dtype=np.int32)
def my_kernel(X,Y):
return np.dot(X,X.T)
svr = svm.SVR(kernel=my_kernel)
x = data[['density', 'speed']].iloc[:2000]
y = data['label'].iloc[:2000]
x_t = data[['density', 'speed']].iloc[2000:3000]
y_t = data['label'].iloc[2000:3000]
svr.fit(x,y)
y_preds = svr.predict(x_t)
the problem happens in the last line svm.predict
which says:
X.shape[1] = 1000 should be equal to 2000, the number of samples at training time
I searched the web to find a way to deal with the problem but many questions alike (like {1}, {2}, {3}) were left unanswered.
Actually, I had used SVM methods with rbf
, sigmoid
, ... before and the code was working just fine but this was my first time using customized kernels and I suspected that it must be the reason why this error happened.
So after a little research and reading documentation I found out that when using precomputed
kernels, the shape of the matrix for SVR.predict()
must be like [n_samples_test, n_samples_train]
shape.
I wonder how to modify x_test
in order to get predictions and everything works just fine with no problem like when we don't use customized kernels?
If possible please describe "the reason that why the inputs for svm.predict
function in precomputed
kernel differentiates with the other kernels".
I really hope the unanswered questions that are related to this issue could be answered respectively.