I am still new to using R (only a few months), and I am trying to build a Bayesian network (BN) for my research (biology). I have already done all this with discrete variables, however, I am now trying to integrate continuous as well, which I know can be an issue. Right now I'm just building a BN to use in the MoTBFs
package to build a hybrid network, using the bnlearn
package. Here is my data:
head(training)
Sample rs12913832 rs16891982 rs12203592 rs1800407 rs3829241 rs1805007 rs1408799 rs683 rs3737576
1 1078 CT GG GG CT GG CC GG TT TT
2 1254 TT CC GG CC GG CC AG GG TT
3 1285 CT GG GG CC GG CC GG TT TT
4 1308 CT GG GG CC AG CT AG GT TT
5 1382 CC GG GG CC AA CT AG GT TT
I have gotten this string below to work as I'm only targeting a few SNPs from the above data:
bn.bayes.Leye<-mmhc(trainingmotbf2[,c(2:6,8,18,26)])
However, it is not creating the correct arcs, so I'm trying to create a whitelist, which looks like this (L is a column not shown in the subset above):
from to
1 L rs12913832
2 L rs1800407
3 L rs16891982
4 L rs1408799
5 L rs3829241
6 L rs12203952
7 L rs12896399
When I try to add this whitelist, called white
to the bn function:
bn.bayes.Leye<-mmhc(trainingmotbf2[,c(2:6,8,18,26)],whitelist=white)
Error in build.whitelist(whitelist, nodes = names(x), data = x, algo = method, :
unknown node label present in the whitelist.
Now the error is not cryptic, but all the names in the whitelist are in the data frame. THey show up in the bn that successfully gets created in the whitelist. I've tried the data as factors and characters thinking that had to be a certain format, but same error. What am I missing?
Does anyone have experience or suggested packages for building BN's with a continuous parent and discrete child nodes? Maybe the MoTBFs
package is not what I should be using.