0

I know there is a library in python

from sklearn.naive_bayes import MultinomialNB

but I want to know how to create one from scratch without using libraries like TfIdfVectorizer and MultinomialNB?

fik
  • 1
  • 1
  • 1
    Welcome to SO fik! Do you mean "*how do I implement Naive Bayes + how do I implement a TFIDF vectorizer*" or "*after importing the library, how do I fit the model to data?*" – Alexander L. Hayes Apr 20 '21 at 03:19
  • how do I implement Naive Bayes with TFIDF without using the library TfIdfVectorizer and MultinomialNB(), but from scratch – fik Apr 23 '21 at 04:24
  • The question is too broad to answer here, please review [How to ask](https://stackoverflow.com/help/how-to-ask). For an overview of multinomial naive Bayes, [Dan Jurafsky's slides (slide 41 specifically)](http://web.stanford.edu/~jurafsky/slp3/slides/7_NB.pdf#page=41) has a worked example, and [Gautam Kunapuli's slides are a good reference](https://gkunapuli.github.io/files/cs6375/09-NaiveBayes.pdf). Both explain naive Bayes with respect to the bag of words (`CountVectorizer`) model, but their implementation would be equivalent for a TFIDF vectorizer. – Alexander L. Hayes Apr 23 '21 at 12:55

1 Answers1

0

Here is the step-by-step about how to make simple MNB Classifier with TF-IDF

  1. First, you need to import the method TfIdfVectorizer to tokenize the terms inside the dataset, the MultinomialNB as the classifier, and the train_test_split for splitting the dataset. (Both are available in sklearn).

  2. Split the dataset into train and test sets.

  3. Initialize the constructor of TfIdfVectorizer, then Vectorize/Tokenize the train set by the method fit_transform.

  4. Vectorize/Fit the test set with the method fit.

  5. Initialize the classifier by calling the constructor MultinomialNB().

model = MultinomialNB() # with default hyperparameters
  1. Train the classifier with the train set.
model.fit(X_train, y_train)
  1. Test/Validate the classifier with the test set.
model.predict(X_test, y_test)

Those 7 steps above are the simple steps. Apparently you can also do the text preprocessing and also model evaluation.

Dhana D.
  • 1,670
  • 3
  • 9
  • 33
  • I don't mean using libraries like TfIdfVectorizer and MultinomialNB(), but how to build them from scratch – fik Apr 23 '21 at 04:21