4

How can I do bayesian structure learning and inference for continuous variables with R?

I was using the 'bnlearn' package as follows:

For structure learning using the Hill Climbing algorithm , I do the following

mynetwork = hc(dataset,score='bic',restart = 0)
mynetwork.fitted = bn.fit(mynetwork , dataset, method='bayes')
mynetwork.grain <<- as.grain(mynetwork.fitted)

For inference, I was doing the following

predict(mynetwork.grain, response = c("myresponsevariable"), newdata = mytestdata, predictors = mypredictors, type = "distribution")$pred$myresponsevariable

This gives me output as follows for each possible state of the responsevariable

             0         1
[1,] 0.8745255 0.1254745

However, the trouble is that this works only for categorical data(factors). When I use it on a dataset which has continuous variable like integer, numeric etc. it gives me the following errorenter image description here

By looking at the source of the package (Lines 666 and Lines 418-419 ), I could understand that the package expects all it's input columns to be of type factor.

Is there an alternative method (package) for computing bayesian networks in R, by which I can do network learning and inference on continuous (numeric) data for some of the columns?

EDIT : Thanks, I have tried the following as per user2957945's suggestion, and it works. However, is there a way to get the probability numbers of states 0 and 1 for the prediction. Like for example, say, I'm trying to predict for df[3,], and it gives me the prediction as 0. However, I would also like to see the probability of it being 0 and probability of it being 1, to set threshold above. See my original post, where I have posted some sample probabilities.

#making the data set
col1 = c(1,2,3,4,5)
col2 = c(10,20,30,40,50)
col3 = c(0,1,1,0,0)
df = data.frame(col1,col2,col3)
df$col3 = as.factor(df$col3)

#learning the network
hcbn = hc(df)
hcbn.fit = bn.fit(hcbn,df)

#doing prediction
> predict(hcbn.fit,"col3",method = "bayes-lw",data = df[3,])
[1] 0
Levels: 0 1
tubby
  • 2,074
  • 3
  • 33
  • 55
  • 1
    You can try `JAGS`, `stan` and their respective `R` packages `rjags` and `rstan`. However, I suggest you to learn Bayesian Networks deeply to understand which is the difference between a discrete net and a continuous one, how one can handle continuous values and the difference between exact inference and sampling from a net. – nicola Dec 18 '15 at 08:26
  • bnlearn has approximate methods.. `cpquery` and a `predict` method. – user2957945 Dec 18 '15 at 09:38
  • @user2957945, thanks for the reply. Hmm, I understand bnlearn has a predict method, and am using it currently. However, as I said, it does not work with continuous variables. Can you please, maybe give me an example snippet where it can be made to work with continuous variables. – tubby Dec 18 '15 at 20:25
  • It does work for continuous variables... `m <- hc(gaussian.test) ; ft <- bn.fit(m, gaussian.test) ; pred <- predict(ft, "A", method="bayes-lw", data=gaussian.test)` . Also look at the last example in `?cpquery`, which shows an inference example, given some evidence. – user2957945 Dec 18 '15 at 22:00
  • 1
    The BIC score notation for gaussian models is `bic-g` . See `help("bnlearn-package")` for available scores (a quick way to see the list `hc(gaussian.test, score="fakename")`) – user2957945 Dec 18 '15 at 22:01
  • @user2957945, thanks. Please see my edit above. I tried out the m <- hc(gaussian.test) ; ft <- bn.fit(m, gaussian.test) ; pred <- predict(ft, "A", method="bayes-lw", data=gaussian.test) you mentioned above and it works. However, is there a way to get the probabilities of each of the classes, instead of just the prediction. – tubby Dec 19 '15 at 09:36
  • @PepperBoy; first we can see if we can get your question reopened - but you will need to change the wording of the sentence that starts `Is there an alternative method (package) ...`. (I think) this is why the question was closed as it is off-topic to request such. I think this is a bit much given the rest of the content of your question, but it may be worth removing it or rephrasing it, so that it may be reopened. – user2957945 Dec 19 '15 at 14:06
  • It is a bit clunky in bnlearn to get the posterior probabilities. There is an example, in `?cpquery` that shows how to use sapply to loop across the rows to assign the observed data as evidence to make predictions on another variable. However, remember, for gaussian / cond-gaussian models you cant specify the exact value of the continuous data as evidence (as probability of being equal to this value is zero). – user2957945 Dec 19 '15 at 14:28
  • So writing explicitly, for the first observation, you will want something like `cpquery(hcbn.fit, (col3=='0'), (col1 < '1') & (col2 < '10'))` to get the probability (as col1 and col2 are continuous). You can of course automate this as in the help page. If you have a large number of variables and observations it may be worth parallelizing this when looping through the rows of you data. [ps the data in your predict statement in your edit should be `data=df` ] – user2957945 Dec 19 '15 at 14:30
  • @user2957945, thanks for helping. But I did try out the cpquery function you mention above. Hmm, isn't there an easier way to get the probability of the response variable for 0 and 1 respectively. I want to report prediction only if the probability is above a certain threshold. When I tried out cpquery giving exact values like cpquery(hcbn.fit, (col3=='0'), (col1 == '1') & (col2 < '10')), I always seem to get 0. Not very sure if what it's reporting is actually true. – tubby Dec 20 '15 at 00:04
  • See comment two above. As col1 is numeric, it is considered as normally distributed and so the probability of being equal to one is zero, hence P(col3) is always zero. I dont think there is an easier way. But i see the problem, if you had used , for example, logistic regression you could examine the predictions given the input variables exactly, and similarly if all variables were discrete in your BN, but i am not sure of how you would conceptually do this for conditional-Gaussian BN. I think it may be worth asking on http://stats.stackexchange.com/questions – user2957945 Dec 20 '15 at 15:31
  • try `%>% mutate_if(is.numeric, as.factor)` – Kamaldeep Singh Jan 09 '18 at 21:39

0 Answers0