1

Goodmorning everyone, This is Riccardo from Italy. This is my first rodeo on StackOverflow's question, so sorry if I'am doing something wrong.

I am also not so expert about coding in R and I am struggling with the following issue: I am try to build a Bayesian Network in R starting from Genetic Expression Data (link for file carm.csv) using a model averaging MMHC (bootstrap on MaxMin Hill-Climbing) to learn a robust structure and a Bayes approach for parameter learning.

All is good util I use the function bn.fit {bnlearn} for parameter learning, as an error message occurs: "Error in bn.fit(...) : the graph is only partially directed.". It is true that the graph in input is not directed, but this is an University homework, and the it is not supposed to find a undirected graph (also, it is strange only one arch, btw).

I tried to search and maybe found this useful stuff Bayesian Networks in R, O'Reilly 2013, p. 35:

“bnlearn 3.2 and later versions are more picky about setting arc directions; as a result bn.gs is an undirected graph and must be extended into a DAG with cextend() to conclude the example.” ...but forcing a graph in a DAG using a function ain't so much funny to me:)

Could you please help me on this? Do you see something wrong on my code?

I attach the code here:


library(bnlearn)
library(Rgraphviz)



###BAYESIAN NETWORKS

#Use a Bayesian network to analyze genetic data (in the file carm.csv) on gene
#expression measured in a series of cytokines in order to assess
#their association with CARM protein expression on a sample of 42 subjects.

#Discretize the data using Hartemink's method (considering 3 final levels
# and 8 initial levels) and proceed with network structure learning
#using a hybrid-type algorithm (Max-Min Hill-Climbing)


#Load:
data <- as.data.frame(read.csv("carm.csv", sep = ";",))

#Discretization:
discret.data<-discretize(data, method='hartemink', breaks=3, ibreaks=8, idisc='quantile')

#Use the model averaging approach to obtain a robust final network
#(again with the mmhc algorithm), using 200 bootstrap samples

ddat<-discret.data

set.seed(123)
boot<-boot.strength(data=ddat,R=200,algorithm="mmhc")
print(boot)
plot(boot)

#Use a threshold of 70% to obtain a network using the averaged.network command,
#and then use the bayes method to do parameter learning:

#Average model:
avg.boot<-averaged.network(boot, threshold=0.70)
print(avg.boot)
plot(avg.boot)

#Parameter learning via Bayes Method:
dag.fit<-bn.fit(avg.boot, ddat, method="bayes")
  • From `help(averaged.network)` : "*averaged.network() typically returns a completely directed graph; an arc can be undirected if and only if the probability of each of its directions is exactly 0.5.*". So its worth inspecting your `boot` object to see which edges are included (above threshold) and have direction = 0.5. What to do? The `mmhc` returns a directed graph, so I thought using an odd number of replications would sort this; but no. A quick look at the help page shows the `cpdag=TRUE` default returns the equivalence class. Setting this to `FALSE` should then help (in this case). – user20650 May 23 '22 at 20:44
  • ... or you could try to use `?cextend`; but there is no guarantee that it will be possible to get a fully directed graoh. – user20650 May 23 '22 at 20:46

1 Answers1

0

averaged.network may return a partially directed graph in some situations when the probability is exactly 0.5 for both the two possible directions of the arc. One thing you can try is increasing the number of bootstrap replicates from 200 to let's say 500. That means 500 DAGs will be created from bootstrapped data in order to find the most likely arcs and directions. The more DAGs in your bootstrapping, the greater the chances you can "break the tie" and direct the arcs.

If that doesn't work, you still have an undirected arc and you need to deal with it. There are many ways of doing that, which includes experimenting with other algorithms. The solution really depends on your use case and domain. But, generally speaking, for many domains and use cases, you can simply ignore that arc and you'll still have valid DAG that can be trained to encode the joint probability of your variables. This code will do that, and will certainly make your code work, if you feel good disregarding that arc:

avg.boot<-averaged.network(boot, threshold=0.70)
arcs(avg.boot) <- directed.arcs(avg.boot) # ignore undirected arcs

Your bn.fit will certainly work now that you've removed the undirected arcs.

  • The issue is that `averaged.network` returns the equivalence class of the network, by default (it can be turned off). Hence, there will often be undirected edges. Just dropping these edges doesn't seem like a good idea. Perhaps better ideas would be to use knowledge to direct the edge. – user20650 Jun 15 '22 at 15:37