I am trying to do KNN using Cosine Similarity in SciKIt Learn but it keep throwing these warnings. Can someone explain what is the meaning of these and why is it only coming when I am trying to fit a KNN model with cosine similarity and not with any other distance metric?
Code:
t0 = time.time()
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
vectorizer = TfidfVectorizer()
vec_fit = vectorizer.fit_transform(X)
t1 = time.time()
total = t1-t0
print "TF-IDF built:", total
#######################------------------------############################
t0 = time.time()
nbrs = NearestNeighbors(n_neighbors=20, algorithm='auto', metric=cosine_similarity)
nbrs.fit(X_train_tfidf.toarray())#,Y)
#KD_TREE won't work here becuase it doesn't work with Sparse Matrix -- on giving it a dense matrix, it throws a memory error
t1 = time.time()
total = t1-t0
print "KNN Built:", total
Repeated Warning Msg:
C:\Anaconda2\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is depreca
ted in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single
feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Upon Suggestion Tried doing this:
nbrs = NearestNeighbors(n_neighbors=20, algorithm='auto', metric=cosine_similarity)
nbrs.fit(numpy.array(X_train_tfidf).reshape(1, -1))
which throws the following error:
Traceback (most recent call last):
File ".\tf-idf.py", line 54, in <module>
nbrs.fit(numpy.array(X_train_tfidf).reshape(1, -1))
File "C:\Miniconda2\lib\site-packages\sklearn\neighbors\base.py", line 816, in fit
return self._fit(X)
File "C:\Miniconda2\lib\site-packages\sklearn\neighbors\base.py", line 221, in _fit
X = check_array(X, accept_sparse='csr')
File "C:\Miniconda2\lib\site-packages\sklearn\utils\validation.py", line 373, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.