1

New to StackOverflow and R.

I have a question regarding the different loss functions for cross-validation that are provided in R package BNlearn and which one I should use. I have continuous data (example below) with 32 rows and 8 columns, each column representing a species and each row representing the number of individuals of that species that year.

201  1.78e+08  18500000   1.87e+08   6.28e+07   1.08e+09     1.03e+08   7.22e+07   43100000
202  8.06e+07   9040000   5.04e+07   4.49e+07   6.66e+08     8.07e+07   2.58e+07   24100000
203  1.54e+08   4380000   1.51e+08   2.88e+07   9.94e+08     1.44e+08   7.32e+07   39000000
204  1.36e+08   6820000   3.80e+08   8.39e+06   7.38e+08     1.50e+08   4.25e+07   32600000
205  9.94e+07   9530000   8.99e+07   1.05e+07   6.62e+08     1.67e+08   1.90e+07   29200000
206  1.33e+08   6340000   4.27e+07   3.26e+06   5.31e+08     2.93e+08   2.70e+07   41500000
207  1.22e+08   5710000   4.41e+07   3.16e+06   4.58e+08     4.92e+08   4.02e+07   21600000
208  1.33e+08  13500000   1.20e+08   3.56e+06   4.40e+08     2.50e+08   3.93e+07   30000000
209  1.73e+08  21700000   4.35e+07   7.58e+06   5.62e+08     3.31e+08   4.98e+07   42100000
210  1.86e+08   6950000   3.40e+07   1.18e+07   4.41e+08     3.80e+08   4.83e+07   28100000

So far I have used the Tabu Search to make a fixed network structure and analyzed it with the cross-validation command

bn.cv(data = data, bn = bn.tabu, method = "k-fold", k = 10, runs = 100)

which gives the result

k-fold cross-validation for Bayesian networks

  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (Gauss.) 
  number of runs:                        100 
  average loss over the runs:            151.8083 
  standard deviation of the loss:        0.2384763

The question is, what loss function should I use for my data so that I can change the data set that I use and get comparable results and what does the "average loss over the runs" mean? The end game is to make joint probability distributions and a prediction for year + 1, so basically a row 33 with numbers and their probability distributions.

Sorry for any inconsistencies, as I'm still learning statistics.

Lucius
  • 11
  • 1

1 Answers1

0

i don't know that I understand correctly your question or not. the second question "what does the "average loss over the runs" mean?" because your code is run for 10 times (k=10) this means the average of loss function of the 10 times. and about first question it's better to have a look at this page. https://stats.stackexchange.com/questions/339897/what-is-the-difference-between-loss-function-and-mle sorry for bad language, my English language isn't good as you see.