I have to classify articles into my custom categories. So I chose MultinomialNB from SciKit. I am doing supervised learning. So I have an editor who look at the articles daily and then tag them. Once they are tagged I include them into my Learning model and so on. Below is the code to get an idea what i am doing and using. (I am not including any import lines because I am just trying to give you an idea of what I am doing) (Reference)
corpus = (train_set)
vectorizer = HashingVectorizer(stop_words='english', non_negative=True)
x = vectorizer.transform(corpus)
x_array = x.toarray()
data_array = np.array(x_array)
cat_set = list(cat_set)
cat_array = np.array(cat_set)
filename = '/home/ubuntu/Classifier/Intelligence-MultinomialNB.pkl'
if(not os.path.exists(filename)):
classifier.partial_fit(data_array,cat_array,classes)
print "Saving Classifier"
joblib.dump(classifier, filename, compress=9)
else:
print "Loading Classifier"
classifier = joblib.load(filename)
classifier.partial_fit(data_array,cat_array)
print "Saving Classifier"
joblib.dump(classifier, filename, compress=9)
Now I have a Classifier ready after custom tagging and it works well with new articles and work like a charm. Now the requirement has arisen to get most frequent words against each category. In short I have to extract feature from the learned model. By looking into documentation I only found out how to extract text features at the time of learning.
But once learned and I only have the model file (.pkl) is it possible to load that classifier and extract features from it?
Will it be possible to get the most frequent terms against each class or category?