How to use trained text classification model

Question

I implemented an SVM model that can classify given text into two categories. The model was trained and tested using data.csv data set. Now I want to use this model with live data. To do that I used the pickle python library. First I saved the model.

joblib.dump(clf, "model.pkl")

Then I have loaded that model.

classifer = joblib.load("model.pkl")

Then I used below input as text to be classified.

new_observation = "this news should be in one category"
classifer.predict([new_observation])

But after running this, it gives an error.

ValueError: could not convert string to float: 'this news should be in one category'

I referred below link to know about how to save and load the trained model. [https://scikit-learn.org/stable/modules/model_persistence.html][1]

EDIT

Here is the code I used to create an svm model.

data = pd.read_csv('data1.csv',encoding='cp1252')

def pre_process(text):

    text = text.translate(str.maketrans('', '', string.punctuation))

    text = [word for word in text.split() if word.lower() not in 
    stopwords.words('english')]

    words = ""

    for i in text:

            stemmer = SnowballStemmer("english")

            words += (stemmer.stem(i))+" "

    return words

textFeatures = data['textForCategorized'].copy()

textFeatures = textFeatures.apply(pre_process)

vectorizer = TfidfVectorizer("english")

features = vectorizer.fit_transform(textFeatures)

features_train, features_test, labels_train, labels_test = train_test_split(features, data['class'], test_size=0.3, random_state=111)

    svc = SVC(kernel='sigmoid', gamma=1.0)

    clf = svc.fit(features_train, labels_train)

    prediction = svc.predict(features_test)

And after implementing the model, here is the way I try to give input to the model.

joblib.dump(clf, "model.pkl")

classifer = joblib.load("model.pkl")

new_observation = "This news should be in one category"

classifer.predict(new_observation)

EDIT

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
textFeature =pre_process(textFeature) 
classifer.predict(textFeature.encode())

Here is the code that I used to load the model and input text to the model. After doing so, I added code to get prediction value. But I got an error.

ValueError: could not convert string to float: b'dengu soar '

How did you train the model? SVMs accept only numerical input in the form of a bidimensional numpy or scipy array. Try showing also the training code — lsabi, Feb 02 '20 at 14:39
For training, Strings were used directly. That is the issue. The string should be converted into numeric before training. Right? Thanks, Isabi. — Dinuka, Feb 04 '20 at 16:54
Yes, theoretically, it should not work. Are you sure that the training data is not converted into a numerical representation first, like one-hot encoding? Do you have access to the training code? Do you mind posting it? — lsabi, Feb 04 '20 at 20:55
I modified the code above with the encoding. After modifying the code with encoding I got the same error. Can you see any issues with this code? — Dinuka, Feb 07 '20 at 05:13
@Reuben answered with the right answer: you are not transforming the input that you are feeding into the model. See his answer — lsabi, Feb 07 '20 at 08:15

score 0 · Answer 1 · answered Feb 07 '20 at 05:54

0

You should pre-process new_observation before feeding it to the model. In your case, you've only pre-processed textFeatures for training, you must repeat the pre-processing steps for new_observation too.

Apply the pre_process() function on new_observation
Use vectorizer to transform the output obtained from pre_process(new_observation)

answered Feb 07 '20 at 05:54

Reuben

467
3
9

Thanks for the answer. When I tried that way I got an error. I added the error in the question section with the code. – Dinuka Feb 16 '20 at 10:58

score 0 · Answer 2 · answered Sep 18 '20 at 07:04

I have got the same issue and resolved by resizing single string data as per the shape of training data.

complete code:

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
vocabulary=pre_process(textFeature) 
vocabulary_df =pd.Series(vocabulary)

#### Feature extraction using Tfidf Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(stop_words='english')

test_ = vectorizer.fit_transform(vocabulary_df.values)

test_.resize(1, features_train.shape[1])
classifer.predict(test_)

How to use trained text classification model

2 Answers2