1
import pandas as pd
import numpy as np
from sklearn import preprocessing, svm
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import math
import numpy.linalg as la

df = pd.read_csv("DataWithoutHeader162.csv")
df.columns = ['Temperature','Humidity','Windspeed','Traffic','PM 2.5']
#print(df.head())

forecast_col = 'PM 2.5'
df['label'] = df[forecast_col].shift(1)
df.fillna(value=-99999, inplace=True)

X = np.array(df.drop(['label','PM 2.5'] , 1))
X = preprocessing.scale(X)
df.dropna(inplace = True)

y = np.array(df['label'])
df.dropna(inplace = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05) 


#kernel definition
def radial_basis(gamma=10):
    return lambda x, y: np.exp(-gamma*la.norm(np.subtract(x, y)))

#SupportVectorMachine with radial_basis Kernel
clf_SVM_radial_basis = SVC(kernel = radial_basis())
clf_SVM_radial_basis.fit(X_train,y_train)
confidence3 = clf_SVM_radial_basis.score(X_test,y_test)
print("Confidence of SVM with radial_basis Kernel = ",(confidence3*100),"%")

This code shows error:

Traceback (most recent call last):  
File "F:\MachineLearningPyCodes\SvmOnDelhiAqiDataPrbf.py", line 68, in  
module  
clf_SVM_radial_basis.fit(X_train,y_train)  
File "C:\Python35\lib\site-packages\sklearn\svm\base.py", line 189, in fit  
fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)  
File "C:\Python35\lib\site-packages\sklearn\svm\base.py", line 230, in   
_dense_fit  
if X.shape[0] != X.shape[1]:  
IndexError: tuple index out of range  

I tried different methods but I am not able to format my dataset as per required I guess, please tell me a method to do that.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Nakul Sharma
  • 419
  • 1
  • 4
  • 13

1 Answers1

1

I was also confused by how SVC's kernel argument worked. It's not just the kernel function radial_basis as you have it; it actually has to return the Gram matrix. You can see this in the sklearn documentation:

http://scikit-learn.org/stable/modules/svm.html#custom-kernels

To summarize that section: you have two choices.

(1) either plug the Gram matrix directly into the fit() method (not just plain X_train), and use kernel='precomputed'; or

(2) write a function that returns the Gram matrix, and then you can pass that new function instead to kernel.

This SO has good examples. Adapting what they wrote, you could do it this way. I'll use the second method, and I'll keep your original radial_basis for illustration.

def radial_basis(x, y, gamma=10):
    return np.exp(-gamma * la.norm(np.subtract(x, y)))

def proxy_kernel(X, Y, K=radial_basis):
    """Another function to return the gram_matrix,
    which is needed in SVC's kernel or fit
    """
    gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
    for i, x in enumerate(X):
        for j, y in enumerate(Y):
            gram_matrix[i, j] = K(x, y)
    return gram_matrix

clf_SVM_radial_basis = SVC(kernel=proxy_kernel) # Note that it's proxy_kernel here now
clf_SVM_radial_basis.fit(X_train, y_train)
Community
  • 1
  • 1
Niels Joaquin
  • 1,205
  • 1
  • 12
  • 14