1

I am working on a research problem and due to a small sized dataset with subjects I am trying to implement Leave N Out style analyses.

Currently I am doing this ad-hoc and I stumbled upon scikit-learn LeavePGroupsOut function.

I read the docs but I am unable to understand how to use it in multidimensional array.

My data are the following: I have 50 subjects, around 20 entries per subject (not fixed) and 20 features per entry with ground-truth value (0 or 1) for every entry.

konsalex
  • 425
  • 5
  • 15

1 Answers1

1

Well the documentation is actually pretty clear: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeavePGroupsOut.html#sklearn.model_selection.LeavePGroupsOut

In your case you need to concatenate your array s.t. you can provide for every entry and feature the group index. Thus your feature array will have the shape 50*20 datapoints times 20 features (1000,20), so your group array also needs to have shape (1000,).

Then you need to define the cross validation via

lpgo = LeavePGroupsOut(n_groups=n_groups)

It's important to notice that this will result in all possible combinations of left out test groups.

Merk
  • 171
  • 12