0

I would like to use a PCA with subsequent feature selection/filtering in mlr3.

I could not yet find AIC and BIC for this "filtering" in the package/framework.

Is this because they do not fit conceptually, e.g. are all methods in mlr3filters conceptually different from these information criteria, e.g. as they select models and not features? But then they should be available under mlr_measures?

Or are they available via extension packages?

ds_col
  • 129
  • 10
  • 1
    You could always use cross validation to help identify best performing models (that are not overfit). – Marc in the box Mar 19 '21 at 08:20
  • Ok, but this would be another method for model selection. Or ist it conceptually the same (after some proof which may not be abvious)? – ds_col Mar 19 '21 at 08:23
  • 1
    I am too unfamiliar with log likelihood approaches to say, but when CV is done right, it should identify the "best" model, and is flexible enough to be used to compare the performance of models that differ in their construction (i.e. where AIC is not applicable). The important concepts are 1. how to split your data into training and validation sets (e.g. k-fold) and 2. the selection of an appropriate fitness function (e.g. RMSE). – Marc in the box Mar 19 '21 at 08:31
  • Ok, I understand that a ML approach is too general to allow for "distributional assumptions" that would be needed for log likelihood to qualify, correct? – ds_col Mar 19 '21 at 08:51
  • Then AIC is probably not for your model. Are you doing classification? Or is your response variable continuous? – Marc in the box Mar 19 '21 at 08:54
  • Continous in the sense that it is a probability ("prediction object") of a classifier. But this may not be suitable for the log likelihood framework. I am not sure though that there is no mathematical connex possible for this kind of problem. – ds_col Mar 19 '21 at 09:41
  • 1
    These probabilities can of course be turned into binomial predictions, but you may not need to. You should look into the [F-score](https://en.wikipedia.org/wiki/F-score) or [ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) as a possible fitness metrics. – Marc in the box Mar 19 '21 at 10:16

0 Answers0