How to make naive bayes multinomial with TF-idf from scratch in python?

Question

I know there is a library in python

from sklearn.naive_bayes import MultinomialNB

but I want to know how to create one from scratch without using libraries like TfIdfVectorizer and MultinomialNB?

Welcome to SO fik! Do you mean "*how do I implement Naive Bayes + how do I implement a TFIDF vectorizer*" or "*after importing the library, how do I fit the model to data?*" — Alexander L. Hayes, Apr 20 '21 at 03:19
how do I implement Naive Bayes with TFIDF without using the library TfIdfVectorizer and MultinomialNB(), but from scratch — fik, Apr 23 '21 at 04:24
The question is too broad to answer here, please review [How to ask](https://stackoverflow.com/help/how-to-ask). For an overview of multinomial naive Bayes, [Dan Jurafsky's slides (slide 41 specifically)](http://web.stanford.edu/~jurafsky/slp3/slides/7_NB.pdf#page=41) has a worked example, and [Gautam Kunapuli's slides are a good reference](https://gkunapuli.github.io/files/cs6375/09-NaiveBayes.pdf). Both explain naive Bayes with respect to the bag of words (`CountVectorizer`) model, but their implementation would be equivalent for a TFIDF vectorizer. — Alexander L. Hayes, Apr 23 '21 at 12:55

score 0 · Answer 1 · answered Apr 20 '21 at 03:24

Here is the step-by-step about how to make simple MNB Classifier with TF-IDF

First, you need to import the method TfIdfVectorizer to tokenize the terms inside the dataset, the MultinomialNB as the classifier, and the train_test_split for splitting the dataset. (Both are available in sklearn).
Split the dataset into train and test sets.
Initialize the constructor of TfIdfVectorizer, then Vectorize/Tokenize the train set by the method fit_transform.
Vectorize/Fit the test set with the method fit.
Initialize the classifier by calling the constructor MultinomialNB().

model = MultinomialNB() # with default hyperparameters

model.fit(X_train, y_train)

model.predict(X_test, y_test)

Those 7 steps above are the simple steps. Apparently you can also do the text preprocessing and also model evaluation.

I don't mean using libraries like TfIdfVectorizer and MultinomialNB(), but how to build them from scratch — fik, Apr 23 '21 at 04:21

1 Answers1