2

I am running into a value error that my array is incorrect, which is extremely strange since I have confirmed that my array is not zero. I printed out the 'len' of each feature and training set. Found array with 0 feature(s) (shape=(7, 0)) while a minimum of 1 is required by SVC. I am using spacy 3.4.1 and python 3.8.10. What am I doing wrong?

import spacy
from sklearn import svm

nlp = spacy.load("en_core_web_trf")

train_x = [
        "good characters and plot progression", 
        "check out the book", 
        "good story. would recommend", 
        "novel recommendation", 
        "need to make a deposit to the bank", 
        "balance inquiry savings", 
        "save money"
        ]


train_y = [
            "BOOKS", 
            "BOOKS", 
            "BOOKS", 
            "BOOKS", 
            "BANK", 
            "BANK", 
            "BANK", 
            ]


docs = [nlp(text) for text in train_x]
train_x_vectors = [doc.vector for doc in docs]

print (len(train_x_vectors))
print (len(train_y))

clf_svm = svm.SVC(kernel='linear')
clf_svm.fit(train_x_vectors, train_y)
user3655574
  • 692
  • 2
  • 9
  • 27
  • `Doc.vector` is going to be empty with trf pipelines in spaCy by default. Are you sure you're not pssing empty vectors? – polm23 Oct 17 '22 at 03:57

1 Answers1

1

The spacy pipeline package "en_core_web_trf" does not come with word vectors you need to use en_core_web_lg in order to utilize ".vector"

Rolito
  • 11
  • 1