0

Here is my problem which I hope you can help me with:

Lets say we live in a world where there are only two categories, where each has some features. The objects in this world are different permutations of these features.

cat1: {a, b, c, d, e, f}

cat2: {g, h, i, j}

Now we have an object with these features:

obj: {a, b, c, d, g, h}

What is the probability that this object gets categorized as Cat.1? p(cat1|a, b, c, d, g, h)?

In general how can I model an equation for: n categories each with different number of features, objects with different permutations?

Barbaletta
  • 13
  • 3

1 Answers1

0

You can use a Bayesian classifier to calculate these probabilities. To fit a normal distribution as it is the most used distribution for Bayesian classifiers see this link. However, to use this solution you need to assume that every object holds a value for all of your features for example:

obj1:{a=1, b=1, c=1, d=1, e=0, f=0, g=1, h=1, i=0, j=0}

Note that feature values must be normalized before calculating parameters of your distribution.

Community
  • 1
  • 1
Alireza
  • 135
  • 1
  • 9
  • Thank you Alireza! Would u conceptually help me? my problem is that I cannot construct it manualy to make sense of it. For instance what are the liklihood and prior probability in this specific scenario? I'm at that level of basic... – Barbaletta Nov 30 '15 at 17:48
  • Lets do it with an example. Consider we have 2 categories: M=male and F=female. Now we are going to train a Bayesian classifier that for a new observation X tells us the probability of P(C=M | X) and P(C=F | X). As you know the conditional probability can be written as P(C=M | X) = P(M) * P(X|C=M) / P(X) = Prior * Likelihood / Evidence. Consider our features are defined like this: weight, height, voice frequency band width and etc. You need to have a train Database which means already observed data from your world. for example you have seen 1000 persons 700 male and 300 female. – Alireza Nov 30 '15 at 18:11
  • Then the prior probability will be P(M) = 700/1000 and P(F) = 300/1000. To calculate likelihood you need to fit a distribution to your observed database as I gave you the link. Read Wikipedia page for Naive Bayesian Classifier, it is good. – Alireza Nov 30 '15 at 18:18
  • Thank you again, I read the wiki page and calculated the probabilities but since P(g|cat1)=0 the numerator becomes 0 as well. So do I have to distrbute the prob across two categories? e.g. P(g|cat1)=0.1 and P(g|cat2)=0.9? The other problem is with p(cat1), in men and women example they all share the same features but with different values. In my example cat.1 has 6 features and cat.2 has 4 features, is P(cat1)=P(cat2)=0.5 or is it P(cat1)=0.6 and P(cat2)=0.4? – Barbaletta Nov 30 '15 at 18:53
  • The prior probability should be calculated according to your sample data number not the features! the number of data you have in each category. But about the features I haven't seen such problem. You need to define and extract all of your features for both categories if you want to solve your problem with classification approach. – Alireza Nov 30 '15 at 19:13
  • The conditional probabilities you defined are not correct you are not going to calculate P(g|cat1) if g is a feature this probability is not correct. It seems that your feature definition has some problems. You cannot have different features for different categories and then have combination of them in observation. – Alireza Nov 30 '15 at 19:18
  • Isn't it the case in reality that some features are exclusive to one category for a given object while other features belong to another category? like between category objects? Then they are categorized based on their similarity to the categories: in a very abstract situation like my example obj1 is perceived as a member of cat1 because 4/6 of its features belong to cat1. – Barbaletta Nov 30 '15 at 19:54
  • When you have exclusive features for one category in reality then observing it in an object means that object belongs to the specified category with probability 1. For the case of male and female consider that we define features which are exclusive for men and women for instance amount of children females can give birth in one labor or features like this. Then you will never have an observation which could be a man and also has this feature! – Alireza Nov 30 '15 at 21:32