-1

Let's assume that I've got patients with information about their diseases and symptoms. I want to estimate probability of P(diseasei = TRUE|symptomj = TRUE). I suppose that I should use NB classifier, but every example I've found apply Naive Bayes when there's only one disease (like predicting the probability of heart attack).

My data look like below:

patient | disease | if_disease_present | symptom

1       | d1      | TRUE               | s1
2       | d1      | FALSE              | s2
3       | d2      | TRUE               | s1
4       | d3      | TRUE               | s4
5       | d4      | FALSE              | s8
...

My idea was to split data according to diseases and build the number of naive Bayesian models how many unique diseases I have in my data, but I have doubts if it's proper method.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
ds_fan
  • 1
  • can you please rephrase your question? kinda hard to understand what's the problem here. Is it that you don't have enough data? Is your naive bayes solution isn't working or you *think* it might not work? – axiom Sep 05 '17 at 23:07
  • I just wondering how I can calculate P(X|Y) when X is something like "i take into consideration disease3 and patient suffers from given disease" under condition of symptom1 for example. – ds_fan Sep 06 '17 at 00:12
  • To add more correct: I would like to use classes of all unique diseases and if they were or not as subclasses in ONE model. – ds_fan Sep 06 '17 at 06:28
  • I 've just found that it could be bernoulli naive bayes. Is it correct? – ds_fan Sep 06 '17 at 06:50

1 Answers1

0

If you want to predict the disease, don't split the data on it.

That is your target variable!

But as is, your table is not suitable for this task. You need to preprocess it, probably do some pivotization.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194