-2

I cannot use models in mlr3 other than random forest, part, knn, svm, gbm etc. I am using mlr3extralearners package but still it seems there are a lot of models supported. How to use nnet, mlp etc in mlr3. I have a regression problem.

Second point is, is there a difference between regr.randomForest and regr.Ranger models? It seems they have the same parameters and default values. If they are same and produce the same result, then why both are used and why not just one?

Thank you

Neha
  • 3
  • 3
  • 1
    1) There are many learners implemented in mlr3, mlr3learners, and mlr3extralearners, including nnet. If you want to request more learners you can in the mlr3extralearners repo. 2) They are two different implementations of a random forest algorithm, there are many blogs discussing it, [here](https://arikuncoro.xyz/blog/data-science/r-python-sql-linux/the-comparison-between-randomforest-and-ranger/) is the first I found on Google – RaphaelS May 06 '22 at 17:15
  • @RaphaelS, when I use learner=lrn("regr.nnet"), it shows Error: Element with key 'regr.nnet' not found in DictionaryLearner!. I have imported both library(mlr3learners) and library(mlr3extralearners) – Neha May 06 '22 at 18:27
  • @RaphaelS, the article you mentioned says I think that the only difference between ranger and randomForest is the processing time but when I try to train it on the same data with 10 fold CV, there is a difference between the mean absolute errors obtained with both the models. RandomForest has MAE=0.48 and ranger has MAE=0.52. – Neha May 06 '22 at 18:37
  • The difference is due to randomness I guess. – Pax May 06 '22 at 18:41
  • @Pax, they also produce slightly different ranking of features when I use the perbutation-based feature selection. – Neha May 06 '22 at 19:01
  • I would say this is due to the procedure, 10-fold cv or cv in general. "[cv] is a resampling method that uses different portions of the data to test and train a model on different iterations". As a consequence, there are always (slight) differences in the result which averages out over ... That is rather a good sign. Difference, but small differences. – Pax May 06 '22 at 19:06
  • @Pax, thank you for your points. Do you know any article/literature which I can put as a reference to explain this point? I need to mention these details in my research paper but of course, I cannot cite StackOverflow comments. – Neha May 06 '22 at 19:14
  • In other words, if you do not set a seed (see `?set.seed()`), you will get slighty different results for repeated executions of one and the same method as long as you allow the cv procedure to use different portions of your data. If there are no overfitting issues, slight differences are in general a good sign. – Pax May 06 '22 at 19:14
  • @Pax, I have put set.seed (123), both when I classify data into train/test splts and before producing the feature selections,rankings by these models. However, again, there is a very small difference in the important feature rankings produced by ranger and random forest. . – Neha May 06 '22 at 19:21
  • I am pretty sure you will find what you are looking for in M. Kubat "An introduction to machine learning" or quite comprehensive on page 122, section 5.3.1 Cross-Validation in "Deep Learning" by I. Goodfellow et al., see https://github.com/janishar/mit-deep-learning-book-pdf/blob/master/complete-book-pdf/Ian%20Goodfellow%2C%20Yoshua%20Bengio%2C%20Aaron%20Courville%20-%20Deep%20Learning%20(2017%2C%20MIT).pdf for a complete pdf-version – Pax May 06 '22 at 19:22
  • Regarding your last comment, for those differences you need to dive into technical details and programming differences between the methods. Possibly, if you ask GOOGLE for those differences, you will find a place where ambitious/involved people discuss those nuances. – Pax May 06 '22 at 19:23

1 Answers1

0

Filter this list by class "regr" to see which learners are avaiable for a regression problem.

To initialise a learner, e.g. "regr.kknn" do

# (1) mlr3 learners 
# install.packages("mlr3learners")
library(mlr3learners) 
#> Loading required package: mlr3
#mlr_learners

# (2) extra leaners
# remotes::install_github("mlr-org/mlr3extralearners")
library(mlr3extralearners)
regr_kknn = lrn("regr.kknn")
#> Warning: Package 'kknn' required but not installed for Learner 'regr.kknn'
print(regr_kknn)
#> <LearnerRegrKKNN:regr.kknn>
#> * Model: -
#> * Parameters: k=7
#> * Packages: mlr3, mlr3learners, kknn
#> * Predict Type: response
#> * Feature types: logical, integer, numeric, factor, ordered
#> * Properties: -

Created on 2022-05-06 by the reprex package (v2.0.1)

Then set hyper parameters, train, predict, resample etc. as described in the pleasant mlr3 book.

Regarding your second question, I think {mlr3} does not implement learners on its own. Instead it relies on several libraries. Probably this is the reason why regr.randomForest and regr.Ranger are available.

Pax
  • 664
  • 4
  • 23
  • it seems nnet is only available with classification problem and not regression problem. – Neha May 06 '22 at 18:26
  • Of course, that is why you cannot find it if you filter the mentioned list for "regr". – Pax May 06 '22 at 18:34