In an attempt to classify text I want to use SVM. I want to classify test data into one of the labels(health/adult) The training & test data are text files
I am using python's scikit library.
While I was saving the text to txt files I encoded it in utf-8
that's why i am decoding them in the snippet.
Here's my attempted code
String = String.decode('utf-8')
String2 = String2.decode('utf-8')
bigram_vectorizer = CountVectorizer(ngram_range=(1, 2),
token_pattern=r'\b\w+\b', min_df=1)
X_2 = bigram_vectorizer.fit_transform(String2).toarray()
X_1 = bigram_vectorizer.fit_transform(String).toarray()
X_train = np.array([X_1,X_2])
print type(X_train)
y = np.array([1, 2])
clf = SVC()
clf.fit(X_train, y)
#prepare test data
print(clf.predict(X))
This is the error I am getting
File "/Users/guru/python_projects/implement_LDA/lda/apply.py", line 107, in <module>
clf.fit(X_train, y)
File "/Users/guru/python_projects/implement_LDA/lda/lib/python2.7/site-packages/sklearn/svm/base.py", line 150, in fit
X = check_array(X, accept_sparse='csr', dtype=np.float64, order='C')
File "/Users/guru/python_projects/implement_LDA/lda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 373, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
When I searched for the error, I found some results but they even didn't help. I think I am logically wrong here in applying SVM model. Can someone give me a hint on this?