1

I am getting an error while implementing perceptron online training in sci-kit learn. I have referred this stack overflow question for reference but I am unable to figure out my mistake.

The dataset I was experimenting has 1000 rows and 11 columns. 10 are feature columns and 1 was the class label column.I am attaching the code for your reference:

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron

df = pd.read_csv(r'C:\Users\sjrk\Desktop\ML\Machine learning practise\d-10.csv')

X = df[['D-0','D-1','D-2','D-3','D-4','D-5','D-6','D-7','D-8','D-9']]
y = df['C']

train_test_split =X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

scalar_model = StandardScaler()
scalar_model.fit(X_train)
X_train_std = scalar_model.transform(X_train)
X_test_std = scalar_model.transform(X_test)
#perceptron initialization
ppn = Perceptron(n_iter = 100,eta0=0.1,random_state=0)
# Online training
num_samples = X_train_std.shape[0]
classes_y =  np.unique(y_train)
X_train_std = X_train_std.reshape(700,10)
y_train = y_train.reshape(700,1)

for i in range(num_samples):

    ppn.partial_fit(X_train_std[i], y_train[i], classes = classes_y )

It is throwing an error like this :

ValueError: Expected 2D array, got 1D array instead:
array=[ 1.6540008  -0.09311816 -0.17325239 -1.21276374 -1.27102032 -0.51813835
  1.74932495 -1.49606596  0.61310441 -0.66910947].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

There is something i am doing wrong with the reshaping in the online training. Please help me out.

Thanks

jrk
  • 175
  • 2
  • 9
  • All you need to do is following what the error tells you. ```[1,2,3]``` is something different as ```[[1,2,3]]```. The latter is what sklearn wants (2 dimensions!). In your case: ```ppn.partial_fit(X_train_std[i].reshape(1,-1), y_train[i], classes = classes_y )``` – sascha Feb 18 '18 at 16:34
  • after reshaping as you suggested it is showing "ValueError: bad input shape ()". my question is since we have 10 feature columns in the training dataset, we should feed each row with 10 columns into the perceptron to train, and the output predicted should be compared with the true value. the X_train is already a two-dimensional array with shape (700,10). – jrk Feb 18 '18 at 18:40
  • It's not after indixing with i in that loop. This is a pretty trivial usage-issue and there is probably a good doc-page. If you still got errors, edit your code to something we can run too / show changes. – sascha Feb 18 '18 at 18:41
  • Thank you, I will look into the documentation again. – jrk Feb 18 '18 at 18:47

0 Answers0