2

I'm trying to run MLKnn classifier over my pandas dataframe and when I try to fit the classifier I get this error message:

Series object has no attribute 'getformat'

Here's the code:

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
score = 'f1_macro'

X = dados.drop(['defects'], axis=1)

y = dados['defects']
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=1)

classifier = GridSearchCV(MLkNN(), parameters,scoring=score)
classifier.fit(X_train, y_train)

my dataframe is as shown below:

dtypes and data head

error message

  • 1
    Please add the error output for more details, like at which line does the error occurs, do you have a traceback snippet ? – e.arbitrio Mar 02 '21 at 22:18
  • I updated the original post with the error message – Renato Ferreira Mar 02 '21 at 22:37
  • It seems that basically you are passing 2 pd Series to you split. But the doc says --> https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html <-- you should pass pd Dataframes or lists. So i would try converting x and y in a list for example or use a DF directly and not pandas Series – e.arbitrio Mar 02 '21 at 22:54
  • My DF was generated with pd.read_csv passing my filepath, I used the same train_test_split technique with KNeighborsClassifier and it worked just fine, do you have any suggestions of how should I proceed? (Sorry about any typos erros, english is not my main language.) – Renato Ferreira Mar 02 '21 at 23:05

1 Answers1

2

I tried with your code, and reading here https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/model_selection/_search.py#L723, it says that your parameters shuold be array like. So I converted it using numpy and the error went away.

Here just a snippet with the conversion I did.

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV, train_test_split
import numpy as np

parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
score = 'f1_macro'

X = dados.drop(['defects'], axis=1)
y = dados['defects']

X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=1)
classifier = GridSearchCV(MLkNN(), parameters,scoring=score)
classifier.fit(np.array(X_train), np.array(y_train))
e.arbitrio
  • 556
  • 4
  • 14
  • I tried your solution and the error message is as shown below `/usr/local/lib/python3.7/dist-packages/scipy/sparse/lil.py in _get_row_ranges(self, rows, col_slice) 296 new.rows, new.data, 297 rows, --> 298 j_start, j_stop, j_stride, nj) 299 300 return new _csparsetools.pyx in scipy.sparse._csparsetools.lil_get_row_ranges() ValueError: row index 147 out of bounds` – Renato Ferreira Mar 03 '21 at 13:41
  • Did you check the shape of `X` and `y` ? X must be of shape (n_samples, n_features) and y must be (n_samples, n_output) or (n_samples,) – e.arbitrio Mar 03 '21 at 14:44
  • X has shape of (373,21) and y has a shape of (373,) – Renato Ferreira Mar 03 '21 at 23:14