4

I have done some simple Bayesian classification

X = [[1,0,0], [1,1,0]] ### there are more data of course
Y = [1,0]

classifier = BernoulliNB()

classifier.fit(X, Y)

Now I have got some "insider tips" that the first element in every X is more important than the others.

  1. Can I incorporate this knowledge before I train the model please?

  2. If sklearn doesn't allow it, is there any other classifier or other library that allows us to incorporate our prior before model training please?

aerin
  • 20,607
  • 28
  • 102
  • 140
dgg32
  • 1,409
  • 1
  • 13
  • 33
  • 2
    Can you describe your "insider tips?" Could you add that information as a feature, somehow? I'm thinking you could also multiply the first element for each observation by different values to make that feature "more important," but am not sure about how "best practice-y" that might be. – blacksite Mar 24 '17 at 12:21

2 Answers2

2

I do not know the answer of the question 2 but I can answer question 1.

In the comment "multiply the first element for each observation by different values" is a wrong approach.

When you are using BernoulliNB or Binomial, the way you incorporate prior knowledge is by adding your knowledge into the sample (data).

Let's say you are flipping the coin and you know that the coin is rigged towards more head. Then you are adding more samples that show more heads. If your prior knowledge says 70% heads and 30% tails: You can add total 100 samples, 70 heads and 30 tails, to your data X.

aerin
  • 20,607
  • 28
  • 102
  • 140
  • But by adding artificial data, do you mean adding replicates? In my case, should I add more [1,0,0], or [1,1,0]? Besides, wouldn't these replicates be linear to the original and hence get ignored by the algorithm? – dgg32 Jul 04 '17 at 11:51
  • It's not adding "artificial" data. You said "### there are more data of course". Add those more "real" data. They are NOT replicates and it won't get ignored by the Bernoulli NB algorithm. Your insider information (that the first element is more important than others) should be backed by data not by artificially inflating the first item of the data. – aerin Jul 04 '17 at 17:25
  • @dgg32 Take a look at "class_prior" param in sklearn.naive_bayes.BernoulliNB documentation. – aerin Jul 04 '17 at 17:29
  • Prior is equivalent to saying that I've seen some data in the past that shaped my prior belief. So adding your data to prior or your data X is pretty much the same. (Of course if you add them to your prior knowledge, you can't add them to X again.) – aerin Jul 04 '17 at 17:45
  • I don't think "class_prior" has anything to do with the subject here. It sets the prior for "y"s, but what I want is to set the weights for "x"s. – dgg32 Jul 06 '17 at 11:31
  • The thing is, my insider knowledge is "independent" of the data. So yes, I have a dataset, but my insider told me that the first variable is more important, even if my current data don't show it right now. – dgg32 Jul 06 '17 at 11:36
  • @dgg32 - One rudimentary way is to add some constructed data where the first feature is correlated with the outcome on purpose while the second one isn't. This is your "prior data" – Rohit Pandey Jul 06 '17 at 20:23
-1

Think about what the algorithm is actually doing. Naive Bayes performs the following classification:

p(class = k | data) ~ p(class = k) * p(data | class = k)

In words: The (posterior) probability of an observation being in class k is proportional to the probability of any observation being in class k (that's the prior) times the probability of seeing the observation, given it came from class k (the likelihood).

Usually when we don't know anything, we assume that p(class = k) just reflects the distribution of the observed data.

In your case, you're saying that you have some information, in addition to the observed data, that leads you to believe that the prior, p(class = k) should be amended. This is perfectly legitimate. In fact, that's the beauty of Bayesian inference. Whatever your prior knowledge is, you should incorporate that into this term. So in your case, perhaps that's increasing the probability of being in a particular class (i.e. increasing its weight as suggested in the comments), if you know that it's more likely to occur than the data suggests.

ilanman
  • 818
  • 7
  • 20