1

I have a large sparse matrix (95000, 12000) containing the features of my model. I want to do a stratified K fold cross validation using Sklearn.cross_validation module in python. However, I haven't found a way of indexing a sparse matrix in python.

Is there anyway I can perform StratifiedKFold on my sparse feature matrix?

  • Did it give you an error 'integer cannot be indexed' ? – CoderBC Apr 15 '17 at 14:56
  • It is quite clear that you did not even try. Scikit-learn CV works just fine on sparse matrices, as csr_matrices are default data representation in scikit-learn. – lejlot Nov 08 '15 at 09:33

1 Answers1

0

try this:

# First make sure sparse matrix is to_csr
X_sparse = x.tocsr()
y= output
X_train = {}
Y_train = {}

skf = StratifiedKFold(5, shuffle=True, random_state=12345)
i=0
for train_index, test_index in skf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train[i], X_test[i] = X[train_index], X[test_index]
    y_train[i], y_test[i] = y[train_index], y[test_index]
    i +=1
CoderBC
  • 1,262
  • 2
  • 13
  • 30
  • Don't know why this had been voted down - it worked for me after having been stuck for about 20 minutes. Thanks! – ljdyer Mar 02 '22 at 21:48