bayesian network learning and inference in R for continuous variables

Question

How can I do bayesian structure learning and inference for continuous variables with R?

I was using the 'bnlearn' package as follows:

For structure learning using the Hill Climbing algorithm , I do the following

mynetwork = hc(dataset,score='bic',restart = 0)
mynetwork.fitted = bn.fit(mynetwork , dataset, method='bayes')
mynetwork.grain <<- as.grain(mynetwork.fitted)

For inference, I was doing the following

predict(mynetwork.grain, response = c("myresponsevariable"), newdata = mytestdata, predictors = mypredictors, type = "distribution")$pred$myresponsevariable

This gives me output as follows for each possible state of the responsevariable

             0         1
[1,] 0.8745255 0.1254745

However, the trouble is that this works only for categorical data(factors). When I use it on a dataset which has continuous variable like integer, numeric etc. it gives me the following error

By looking at the source of the package (Lines 666 and Lines 418-419 ), I could understand that the package expects all it's input columns to be of type factor.

Is there an alternative method (package) for computing bayesian networks in R, by which I can do network learning and inference on continuous (numeric) data for some of the columns?

EDIT : Thanks, I have tried the following as per user2957945's suggestion, and it works. However, is there a way to get the probability numbers of states 0 and 1 for the prediction. Like for example, say, I'm trying to predict for df[3,], and it gives me the prediction as 0. However, I would also like to see the probability of it being 0 and probability of it being 1, to set threshold above. See my original post, where I have posted some sample probabilities.

#making the data set
col1 = c(1,2,3,4,5)
col2 = c(10,20,30,40,50)
col3 = c(0,1,1,0,0)
df = data.frame(col1,col2,col3)
df$col3 = as.factor(df$col3)

#learning the network
hcbn = hc(df)
hcbn.fit = bn.fit(hcbn,df)

#doing prediction
> predict(hcbn.fit,"col3",method = "bayes-lw",data = df[3,])
[1] 0
Levels: 0 1

You can try `JAGS`, `stan` and their respective `R` packages `rjags` and `rstan`. However, I suggest you to learn Bayesian Networks deeply to understand which is the difference between a discrete net and a continuous one, how one can handle continuous values and the difference between exact inference and sampling from a net. — nicola, Dec 18 '15 at 08:26
bnlearn has approximate methods.. `cpquery` and a `predict` method. — user2957945, Dec 18 '15 at 09:38
@user2957945, thanks for the reply. Hmm, I understand bnlearn has a predict method, and am using it currently. However, as I said, it does not work with continuous variables. Can you please, maybe give me an example snippet where it can be made to work with continuous variables. — tubby, Dec 18 '15 at 20:25
It does work for continuous variables... `m <- hc(gaussian.test) ; ft <- bn.fit(m, gaussian.test) ; pred <- predict(ft, "A", method="bayes-lw", data=gaussian.test)` . Also look at the last example in `?cpquery`, which shows an inference example, given some evidence. — user2957945, Dec 18 '15 at 22:00
The BIC score notation for gaussian models is `bic-g` . See `help("bnlearn-package")` for available scores (a quick way to see the list `hc(gaussian.test, score="fakename")`) — user2957945, Dec 18 '15 at 22:01
@user2957945, thanks. Please see my edit above. I tried out the m <- hc(gaussian.test) ; ft <- bn.fit(m, gaussian.test) ; pred <- predict(ft, "A", method="bayes-lw", data=gaussian.test) you mentioned above and it works. However, is there a way to get the probabilities of each of the classes, instead of just the prediction. — tubby, Dec 19 '15 at 09:36
@PepperBoy; first we can see if we can get your question reopened - but you will need to change the wording of the sentence that starts `Is there an alternative method (package) ...`. (I think) this is why the question was closed as it is off-topic to request such. I think this is a bit much given the rest of the content of your question, but it may be worth removing it or rephrasing it, so that it may be reopened. — user2957945, Dec 19 '15 at 14:06
It is a bit clunky in bnlearn to get the posterior probabilities. There is an example, in `?cpquery` that shows how to use sapply to loop across the rows to assign the observed data as evidence to make predictions on another variable. However, remember, for gaussian / cond-gaussian models you cant specify the exact value of the continuous data as evidence (as probability of being equal to this value is zero). — user2957945, Dec 19 '15 at 14:28
So writing explicitly, for the first observation, you will want something like `cpquery(hcbn.fit, (col3=='0'), (col1 < '1') & (col2 < '10'))` to get the probability (as col1 and col2 are continuous). You can of course automate this as in the help page. If you have a large number of variables and observations it may be worth parallelizing this when looping through the rows of you data. [ps the data in your predict statement in your edit should be `data=df` ] — user2957945, Dec 19 '15 at 14:30
@user2957945, thanks for helping. But I did try out the cpquery function you mention above. Hmm, isn't there an easier way to get the probability of the response variable for 0 and 1 respectively. I want to report prediction only if the probability is above a certain threshold. When I tried out cpquery giving exact values like cpquery(hcbn.fit, (col3=='0'), (col1 == '1') & (col2 < '10')), I always seem to get 0. Not very sure if what it's reporting is actually true. — tubby, Dec 20 '15 at 00:04
See comment two above. As col1 is numeric, it is considered as normally distributed and so the probability of being equal to one is zero, hence P(col3) is always zero. I dont think there is an easier way. But i see the problem, if you had used , for example, logistic regression you could examine the predictions given the input variables exactly, and similarly if all variables were discrete in your BN, but i am not sure of how you would conceptually do this for conditional-Gaussian BN. I think it may be worth asking on http://stats.stackexchange.com/questions — user2957945, Dec 20 '15 at 15:31

bayesian network learning and inference in R for continuous variables

0 Answers0