1

For a classification problem using BernoulliNB , how to calculate the joint log-likelihood. The joint likelihood it to be calculated by below formula, where y(d) is the array of actual output (not predicted values) and x(d) is the data set of features.

I read this answer and read the documentation but it didn't exactly served my purpose. Can somebody please help.enter image description here

Aman
  • 979
  • 3
  • 10
  • 23

2 Answers2

1

By looking at the code, it looks like there is a hidden undocumented ._joint_log_likelihood(self, X) function in the BernoulliNB which computes the joint log-likelihood.

Its implementation is somewhat consistent with what you ask.

Daneel R.
  • 527
  • 3
  • 9
1
- The solution is to count the positive input variables of the 
  positive output function.
- We achieve this by counting the 
  positive output variables or y of index y or y of 1 or y[1] or 
  data[idx][1].

- The first block of code is the **training** and *learning*.
- The second block of code is the **testing** and *counting*.

- train, test, train_labels, test_labels = train_test_split(Xs[0], 
  ys[0], test_size=1./3, random_state=r)
  naive = BernoulliNB(alpha= 10**-7)
  model = naive.fit(train, train_labels)
  joint_log_train = model._joint_log_likelihood(train)
- l = [np.append(x,y) for x, y in zip(train, train_labels)]

- # Write your code below this line.
- def count(data, label):
    x = 0
    for idx, l in enumerate(label):
        if (l == True):
            x += data[idx][1]
        else:
            x += data[idx][0]
    return x
- for i, (x, y) in enumerate(zip(Xs, ys)):
    train, test, train_labels, test_labels = train_test_split(x, y, 
    test_size=1./3, random_state=r)
    for j, a in enumerate(alphas):   
        naive = BernoulliNB(alpha = a)
        model = naive.fit(train, train_labels)
        joint_log_train = model._joint_log_likelihood(train)
        joint_log_test = model._joint_log_likelihood(test)
        train_jil[i][j] = count(joint_log_train, train_labels)
        test_jil[i][j] = count(joint_log_test, test_labels)