0

I have data like following:

col1   col2   col3
 2      14    text, text, some text

I went through http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing but I could only find information to vectorize col3 and pass it on for classification. In my scenario, I have numerical information in col1 and col2 as well.

If without vectorizing I pass col1, 2 and 3 I get an error for col3 as it is String.

If I vectorize col3, the output is a sparse matrix. I need to add col1 and col2 to the vectorized data. How do I do that?

I am using scikit-learn.

Grant Miller
  • 27,532
  • 16
  • 147
  • 165
royn
  • 1
  • Please show some code then we can help you :) – Stev Mar 28 '18 at 08:54
  • `count_vect = CountVectorizer() X_train = count_vect.fit_transform(col3) X_train, X_test, y_train, y_test = train_test_split(X_train, y, test_size=0.33)` – royn Mar 28 '18 at 09:13
  • This is the code I was using.. But this only allows me to pass vectorized data. I need to pass col1 , col2 and vectorised col3 through. I am not sure how to do that because the data types are different. – royn Mar 28 '18 at 09:15

0 Answers0