14

Code goes as follows, I am trying to use training data for GBRT regression trees, same data works good for other classifiers but gives above error for GBRT. please help :

dataset = load_files('train')
vectorizer = TfidfVectorizer(encoding='latin1')
X_train = vectorizer.fit_transform((open(f).read() for f in dataset.filenames)) 
assert sp.issparse(X_train)     
print("n_samples: %d, n_features: %d" % X_train.shape)
y_train = dataset.target
def benchmark(clf_class, params, name):
    clf = clf_class(**params).fit(X_train, y_train)
bakkal
  • 54,350
  • 12
  • 131
  • 107
Dhananjay Ambekar
  • 183
  • 1
  • 1
  • 7

4 Answers4

16

I came accross the same problem trying to train a GradientBoostingClassifier using the data loaded by load_svmlight_files. Solved by transforming a sparse matrix to a numpy array.

X_train.todense()
Peiqin
  • 378
  • 4
  • 12
5

Because GBRT in sklearn request X (training data) is array-like not sparse matrix: sklearn-gbrt

I hope this could help you!

Chung-Yen Hung
  • 329
  • 1
  • 6
1

OneHotEncoder is no longer supported with Latest version of sklearn library. The code is modified to user ColumnTransformer.

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough')

X = ct.fit_transform(X)

Divyessh
  • 2,540
  • 1
  • 7
  • 24
1

The problem is that you use OneHotEncoder of sklearn.

You need to use the following:

from sksurv.preprocessing import OneHotEncoder

Berk
  • 263
  • 1
  • 14
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 19 '22 at 19:38