I am using Weka to perform a logistic regression using training data where a binary outcome is known. It performs reasonably well, categorizing approximately 80% of instances correctly. I also have a data set using current data where the outcome is unknown. When I run the model using the current data and output predictions, it classifies each instance as either Yes or No and provides an error and probability distribution term (where error + probability distribution = 1). I am having trouble understanding these results. Can someone help me with how I am supposed to interpret them? I have noticed that the model only guesses Yes when the probability distribution is below 0.5. Does that mean that I should read this as a 1-probability distribution that the outcome is yes?
Asked
Active
Viewed 1,653 times
1 Answers
1
The class probabilites always have to sum up to 1. If you had P(Yes)=40% and P(No)=20% and Yes and No are the only classes, what would the missing 40% be?
Also, if the result says P(Yes)=60% and P(No)=40% and you were to give a prediction, not a probability, the obviously rational choice would be Yes, because it has the highest probability of all options. This is the Bayes optimal decision rule. (Thanks to larsmans)
In binary classification problems, this is the same as choosing the answer with P>50%.
Without knowing what the actual output that you get looks like, it indeed seems as if the probability you get is P(No)

Sentry
- 4,102
- 2
- 30
- 38
-
1+1. FWIW, picking the class with the highest probability (or >½ in the binary case) is called the [Bayes optimal decision rule](http://www.commsp.ee.ic.ac.uk/~vb198/compilation/node3.html). – Fred Foo May 08 '13 at 10:26