1

I have built a Bayesian Belief network using the bnlearn package. It consists of 40 factor variables with factor levels ranging from 2 to 16. I created a manual bayesian graph using modelstring() and when I try and run bn.fit on the bayesian model, I get this error "Error in minimal.table(data[, c(node, parents), drop = FALSE], with.missing = !all(data.info$complete.nodes[c(node, : attempting to create a table with more than INT_MAX cells" Is there a way to avoid this without removing any variables from my data-set?

I looked at INT_MAX value for my compiler and it is 2,147,483,647

AnT
  • 19
  • 2
  • Welcome to stackoverflow. Please edit your question to provide a reproducible example. See https://stackoverflow.com/help/how-to-ask for more guidance. – Simon.S.A. May 29 '19 at 00:02
  • I believe that this occurs when a CPT has too many cells; you are likely going to have to restrict the number of parents (or children) of the offending node or collapse some levels. – user20650 May 30 '19 at 11:59
  • Thanks @user20650. Yeah looks like I will have to reduce number of variables and factor levels. When I was doing some research I came across some BNs with a really large number of nodes and was wondering what system could have been used to generate them. My RstudioDesktop compiler is really struggling – AnT May 30 '19 at 23:59
  • @AnT ; it is not that you will need to reduce the number of variables in total, it is that you will need to reduce the number of parents. I am currently learning BNs with 1500 variables but I had to restrict the number of parents (maxp argument) or else I got the same error due to the CPT construction. ... – user20650 May 31 '19 at 08:00
  • ... But you also have to think about the estimation of the CPTs of this size, with so many levels - unless you have millions and millions of observations you are not going to be able to get a decent estimate (lots of zeros if using ML so prohibiting evidence propagation, and fairly uniform if using a Bayes estimate) – user20650 May 31 '19 at 08:00
  • Thanks@user20650! I am new to BN but I think I understand what you're saying. I am testing out some approaches by limiting the variables, factors levels for some of the variables and also changing the BN graph structure a bit to reduce number of parents in the graph. This is based on a business use case so there is a not a lot of room for modifications but hopefully one of the approaches will work. Would you have some reference docs/ articles on Bayesian Network using R that you could suggest? – AnT May 31 '19 at 19:46
  • don't know if you have seen this but the author of bnlearn has written a couple of books - these are applied rather than theoretical. He also provides presentations , uni notes etc at http://www.bnlearn.com/about/. btw you will be able to identify which node(s) are giving the problem, or that will have too many values to estimate reasonably ... – user20650 May 31 '19 at 20:06
  • ... for example, if you have a net `m = hc(learning.test) ; nd = nodes(m)` you can see how many entries a CPT will need with `nd_par = sapply(seq_along(nd), function(i) c(nd[i], parents(m, nd[i])))` then this will give the number of cells for each node `setNames(sapply(nd_par, function(i) prod(sapply(learning.test[i], nlevels))), nd)` (will be a better way to do this!) – user20650 May 31 '19 at 20:08
  • I tried but I am getting this error "Error in graphNEL2M(object) : 'gn' must be a graphNEL object" for nd_par. hc <- hc(dataset) and nd <-nodes(hc) runs fine – AnT May 31 '19 at 22:17
  • have you loaded Rgraphviz or other graph package? Is so, either load bnlearn after Rgraphviz or use `bnlearn::nodes(m)` – user20650 May 31 '19 at 22:33

0 Answers0