6

For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate is a good package for Bayesian Networks, however - as far as I'm concerned - it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample() raises a NotImplementedError (despite this solution says so).

Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?

Rutger Mauritz
  • 153
  • 1
  • 12
  • Are you willing to: 1) switch languages or 2) implement sampling yourself? – kutschkem Dec 02 '19 at 07:42
  • 1
    Please note that questions asking for recommendations are usually off-topic here (see the [help center](https://stackoverflow.com/help/on-topic)). The first part is okay though. I don't know the answer, maybe the Pomegranate package isn't that mature so far. – NoDataDumpNoContribution Dec 02 '19 at 07:48
  • @kutschkem I am looking for a library that provides a good interface for **defining** a Bayesian Network from which I can then sample to obtain a synthetic data-set. – Rutger Mauritz Dec 02 '19 at 12:24

5 Answers5

5

Using pyAgrum, you just have to :

#import pyAgrum
import pyAgrum as gum

# create a BN
bn=gum.fastBN("A->B[3]<-C{yes|No}->D")
# specify some CPTs (randomly filled by fastBN)
bn.cpt("A").fillWith([0.3,0.7])

# and then generate a database
gum.generateCSV(bn,"sample.csv",1000,with_labels=True,random_order=False) 
# which returns the LL(database)

the code in a notebook

See http://webia.lip6.fr/~phw/aGrUM/docs/last/notebooks/ for more notebooks using pyAgrum

Disclaimer: I am one of the authors of pyAgrum :-)

  • Yes I found out about this. Pretty cool that you're responding because I yesterday put a reference to pyAgrum (containing your name) in my paper, as I'm using PyAgrum for many things, but mostly for inference in BN's with soft evidence! – Rutger Mauritz Jan 04 '20 at 11:03
4

Another option is pgmpy which is a Python library for learning (structure and parameter) and inference (statistical and causal) in Bayesian Networks.

You can generate forward and rejection samples as a Pandas dataframe or numpy recarray.

The following code generates 20 forward samples from the Bayesian network "diff -> grade <- intel" as recarray.

from pgmpy.models.BayesianModel import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.sampling import BayesianModelSampling

student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])

cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ['intel', 'diff'], [2, 2])

student.add_cpds(cpd_d, cpd_i, cpd_g)
inference = BayesianModelSampling(student)
df_samples = inference.forward_sample(size=20, return_type='recarray')

print(df_samples)
LoudlySoft
  • 41
  • 2
1

I found out that PyAgrum (https://agrum.gitlab.io/pages/pyagrum.html) does the job. It can both be used to create a Bayesian Network via the BayesNet() class and to sample from such a network by using the .drawSamples() method from the a BNDatabaseGenerator() class.

Rutger Mauritz
  • 153
  • 1
  • 12
1

Another option is Bayespy (https://www.bayespy.org/index.html). You build the network using nodes. And on every node, you can call random() which essentially samples from its distribution: https://www.bayespy.org/dev_api/generated/generated/bayespy.inference.vmp.nodes.stochastic.Stochastic.random.html#bayespy.inference.vmp.nodes.stochastic.Stochastic.random

Christian
  • 1,341
  • 1
  • 16
  • 35
1

I was also searching for a library in python to work with bayesian networks learning, sampling, inference and I found bnlearn. I tried a couple of examples and it worked. It is possible to import several existing repositories or any .bif type. As per this library,

Sampling of data is based on forward sampling from joint distribution of the Bayesian network. In order to do that, it requires as input a DAG connected with CPDs. It is also possible to create a DAG manually (see create DAG section) or load an existing one

Sachz
  • 391
  • 5
  • 21