0

I am trying to execute the following code

from pomegranate import BayesianNetwork
import pandas as pd
X = pd.read_csv('dataframe.csv')

model = BayesianNetwork.from_samples(X, algorithm='exact')
import pickle
with open('graph.pickle', 'wb') as f:
    pickle.dump(model.structure, f)

Where dataframe.csv consists in a 5627x11 dataset of discrete random variables. If I run a fraction of this dataset, the code runs, but with the entire dataset, the kernel restarts, not even starting the function BayesianNetwork.from_samples. What can be done to run the code in this case? Or is there a limitation to the model which cannot be changed?

donut
  • 628
  • 2
  • 9
  • 23
  • What environment is this running in? (e.g. Python 3.8 with Jupyter Notebook on Windows 10) – Alexander L. Hayes Nov 25 '20 at 14:13
  • What distributions characterize the discrete random variables? (e.g. Binary, 4 categories) – Alexander L. Hayes Nov 25 '20 at 14:13
  • Try setting max_parents=3 for now. That will greatly speed both greedy and exact searches. Otherwise, I second @Alexander: can you post X.head() and maybe X.describe(), or attach (some of) the dataset? Pandas doesn't always infer the way you expect. – ctwardy Dec 16 '20 at 20:52

0 Answers0