A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array

Question

Code goes as follows, I am trying to use training data for GBRT regression trees, same data works good for other classifiers but gives above error for GBRT. please help :

dataset = load_files('train')
vectorizer = TfidfVectorizer(encoding='latin1')
X_train = vectorizer.fit_transform((open(f).read() for f in dataset.filenames)) 
assert sp.issparse(X_train)     
print("n_samples: %d, n_features: %d" % X_train.shape)
y_train = dataset.target
def benchmark(clf_class, params, name):
    clf = clf_class(**params).fit(X_train, y_train)

if you are using `GradientBoostingRegressor` it doesn't accept sparse matrices as input. — Moj, May 28 '15 at 09:35
I Converted using arX = np.array(X_train). but now the error says 'tuple index out of range ' — Dhananjay Ambekar, May 28 '15 at 10:36
do as it says in the error. `Use X.toarray() to convert to a dense numpy array`. so `X_train.toarray()` — Moj, May 28 '15 at 10:50
'numpy.ndarray' (type of X_train) object has no attribute 'toarray' — Dhananjay Ambekar, May 28 '15 at 11:34

score 16 · Answer 1 · answered May 16 '16 at 07:28

16

I came accross the same problem trying to train a GradientBoostingClassifier using the data loaded by load_svmlight_files. Solved by transforming a sparse matrix to a numpy array.

X_train.todense()

answered May 16 '16 at 07:28

Peiqin

378
4
12

score 5 · Answer 2 · answered May 28 '15 at 09:44

5

Because GBRT in sklearn request X (training data) is array-like not sparse matrix: sklearn-gbrt

I hope this could help you!

answered May 28 '15 at 09:44

Chung-Yen Hung

329
1
6

I Converted using arX = np.array(X_train). but now the error says 'tuple index out of range ' – – Dhananjay Ambekar May 28 '15 at 10:37
X_train is `scipy.sparse.coo_matrix` is not a `list` in native python – Chung-Yen Hung May 28 '15 at 10:47
I am converting from to still gives tuple error. – Dhananjay Ambekar May 28 '15 at 11:33
well np.array(X_train) doesn't convert. It wraps the sparse matrix into a 1x1 array of dtype object. Have you looked at the error message? – Andreas Mueller May 29 '15 at 15:42
It doesn't give an error! whats the best way I can use sparce.csr_matrix in GBRT classifier..? How to convert it to Dense array data? – Dhananjay Ambekar May 31 '15 at 09:53

score 1 · Answer 3 · edited Aug 02 '20 at 12:21

1

OneHotEncoder is no longer supported with Latest version of sklearn library. The code is modified to user ColumnTransformer.

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough')

X = ct.fit_transform(X)

edited Aug 02 '20 at 12:21

Divyessh

2,540
1
7
24

answered Aug 02 '20 at 07:04

Mayank Lohani

11
1

score 1 · Answer 4 · answered Jun 19 '22 at 10:41

1

The problem is that you use OneHotEncoder of sklearn.

You need to use the following:

from sksurv.preprocessing import OneHotEncoder

answered Jun 19 '22 at 10:41

Berk

263
1
14

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 19 '22 at 19:38

A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array

4 Answers4

Linked