Naive Bayes is a popular (baseline) method for text-classification.
Questions tagged [naivebayes]
1035 questions
5
votes
1 answer
Most important features Gaussian Naive Bayes classifier python sklearn
I am trying to get the most important features for my GaussianNB model. The codes from here How to get most informative features for scikit-learn classifiers?
or here How to get most informative features for scikit-learn classifier for different…

LN_P
- 1,448
- 4
- 21
- 37
5
votes
4 answers
How can a machine learning model handle unseen data and unseen label?
I am trying to solve a text classification problem. I have a limited number of labels that capture the category of my text data. If the incoming text data doesn't fit any label, it is tagged as 'Other'. In the below example, I built a text…

Prasanth Regupathy
- 882
- 10
- 11
5
votes
1 answer
How to predict desired class using Naive Bayes in Text Classification
I have been implementing Multinomial Naive Bayes Classifier from scratch for text classification in python.
I calculate the feature count for each classes and probability distributions for features.
According to my implementation I get the…

Jahangir Alam
- 801
- 8
- 21
5
votes
2 answers
Naive Bayes in Spark MLlib
I have a small file 'naivebayestest.txt' with this structure
10 1:1
20 1:2
20 1:2
From this data I'm trying to classify the vector (1). If I understand Bayes correctly the label for (1) should be 10 (with probability 1!). The program in Spark…

RafaelCaballero
- 1,555
- 2
- 17
- 24
5
votes
2 answers
How to get pseudo-determinant of a square matrix with python
I have a matrix which fails the singular test in which I am calculating for naive bayes classifier. I am handling the ln(det(sigma)) portion of the equation.
if np.linalg.cond(covarianceMatrix) < 1/sys.float_info.epsilon:
return…

kenny
- 75
- 1
- 6
5
votes
1 answer
Posterior probability with pymc
(This question was originally posted on stats.O. I moved it here because it does relate with pymc and more general matters within it: in fact the main aim is to have a better understanding of how pymc works. If any of the moderators believe it not…

rafforaffo
- 511
- 2
- 7
- 21
5
votes
1 answer
Why does Spark ML NaiveBayes output labels that are different from the training data?
I use the NaiveBayes classifier in Apache Spark ML (version 1.5.1) to predict some text categories. However, the classifier outputs labels that are different from the labels in my training set. Am I doing it wrong?
Here is a small example that can…

Pimin Konstantin Kefaloukos
- 1,560
- 3
- 14
- 31
5
votes
1 answer
combining LSA/LSI with Naive Bayes for document classification
I'm new to the gensim package and vector space models in general, and I'm unsure of what exactly I should do with my LSA output.
To give a brief overview of my goal, I'd like to enhance Naive Bayes Classifier using topic modeling to improve…

Seunginah
- 61
- 7
4
votes
2 answers
Text categorization using Naive Bayes
I am doing the text categorization machine learning problem using Naive Bayes. I have each word as a feature. I have been able to implement it and I am getting good accuracy.
Is it possible for me to use tuples of words as features?
For example, if…

user1067334
- 243
- 3
- 7
- 15
4
votes
1 answer
ValueError when using model.fit even with the vectors being aligned
I am attempting to build a naive Bayes model for text classification.
Here is a sample of the data I'm working with:
df_some_observations = filtered_training.sample(frac=0.0001)
df_some_observations.to_dict()
The output looks like this:
{'Intitulé…

Wajih101
- 11
- 7
4
votes
2 answers
Naives Bayes Classifier for bag of vectorized sentences
Summary : How to train a Naive Bayes Classifier on a bag of vectorized sentences ?
Example here :
X_train[0] = [[0, 1, 0, 0], [1, 0, 0, 1], [0, 0, 0, 1]]
y_train[0] = 1
X_train[1] = [[0, 0, 0, 0], [0, 1, 0, 1], [1, 0, 0, 1], [0, 1, 0,…

Clément Perroud
- 483
- 3
- 12
4
votes
1 answer
Java Spark Naive Bayes - predict for future timestamp
Small question regarding prediction/forecast using SparkML and Naive Bayes please.
I have a very simple dataset, which is just time stamp, representing a day, and how many pancakes sold that day:
dataSetPancakes.show();
+----------+-----+
| …

PatPanda
- 3,644
- 9
- 58
- 154
4
votes
2 answers
How to get the feature importance in Gaussian Naive Bayes
I have a Gaussian naive bayes algorithm running against a dataset. What I need is to to get the feature importance (impactfulness of the features) on the target class.
Here's my code:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics…

user13456401
- 439
- 1
- 6
- 12
4
votes
1 answer
R saving Naive Bayes for training, R equivalent to Pythons pickle.
I have been practicing data analysis on Python and have been looking to do the same with R particularly sentiment analysis. With python to train a NB algorithm I could save it as a pickle and reuse to continue training it however I am unsure how I…

OasisTea
- 67
- 1
- 7
4
votes
2 answers
How to produce a confusion matrix and find the misclassification rate of the Naïve Bayes Classifier?
Using the iris dataset in R, I'm trying to fit a a Naïve Bayes classifier to the iris training data so I could Produce a confusion matrix of the training data set (predicted vs actual) for the naïve bayes classifier, what is the misclassification…

CA Burns
- 101
- 1
- 3
- 11