bayesian networks with the catnet package: handling missing data

Question

I am new to this community, r, and programming in general. (Thanks in advance for your patience!) I am working on a project that involves bayesian-networks.

Strait to the question. The following code was posted on this site in response to a question titled "NA/NaN values in bnlearn package R"

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- cnNew(
      nodes = c("a1", "a2", "a3"),
      cats = list(c("1","2"), c("1","2"), c("1","2")),
      parents = list(NULL, c(1,3), NULL)
      )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')

##$a2
##         a1        a3         0         1
## A 0.0000000 0.0000000 0.0000000 1.0000000
## B 0.0000000 1.0000000 0.5712826 0.4287174
## A 1.0000000 0.0000000 0.0000000 1.0000000
## B 1.0000000 1.0000000 0.5685786 0.4314214

However when I copy, paste and run the code I get a different result (see below).

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##   a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3 
## first generate a network with a1,a3 ->a2
cnet <- cnNew(
    nodes = c("a1", "a2", "a3"),
    cats = list(c("1","2"), c("1","2"), c("1","2")),
    parents = list(NULL, c(1,3), NULL)
    )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')
## $a2
##   a1  a3   1   2
## A 1.0 1.0 0.0 1.0
## B 1.0 2.0 0.5 0.5
## A 2.0 1.0 0.5 0.5
## B 2.0 2.0 0.5 0.5

Could someone explain why my results are different? I ask because I am trying to understand how catnet handles missing data.

Best,

John

score 0 · Answer 1 · answered Jun 30 '16 at 16:39

The top/bottom code are identical--they should output the same results. I checked through the catnet functions for other packages which use the same function -- could be your issue. It's good practice to use the :: notation when using non-base functions.

rm(list=ls())
library(catnet)

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- catnet::cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- catnet::cnNew(
  nodes = c("a1", "a2", "a3"),
  cats = list(c("1","2"), c("1","2"), c("1","2")),
  parents = list(NULL, c(1,3), NULL)
)


## set the empirical probabilities from data=my.data
cnet2 <- catnet::cnSetProb(cnet,data=my.data)

## to get the conditional probability table
catnet::cnProb(cnet2,which='a2')

# $a2
# a1  a3   1   2
# A 1.0 1.0 0.0 1.0
# B 1.0 2.0 0.5 0.5
# A 2.0 1.0 0.5 0.5
# B 2.0 2.0 0.5 0.5

If you place `catnet::` in front of the relevant function, it should work for you in both runs. Try it with both, and see if you're using a shared function. — Bob Hopez, Jun 30 '16 at 17:29
Thanks for your help, Bob! Is that the output you got when you ran it? The table at the end is what I'm interested in. The table in your post is the same as the one I got, but different than that in the original post. It appears from my output (and the output in your post) that the rows containing missing data (my.data) are simply ignored when calculating the conditional probabilities (given in the table at the end). This appears not to be the case in the original post. Anyway, thanks again for helping me get my head around this stuff. — John, Jun 30 '16 at 17:33
The notation works, thanks again Bob! However it still gives me different output than the original poster. Maybe he/she was using a shared function... Is there a way to ask this "original poster" (@pes) I keep mentioning? I would have commented on his/her post if I was able to. — John, Jun 30 '16 at 17:50
Running this should help you manually check: `ftable(my.data, row.vars = 1:3, exclude = F)` — Bob Hopez, Jun 30 '16 at 19:30

bayesian networks with the catnet package: handling missing data

1 Answers1

Linked