3

I want to plot a learning curve in my application.

A sample curve image is shown below.

enter image description here

Learning curve is a plot between the following Variance,

  • X-Axis: Number of samples (Training set size).
  • Y-axis: Error(RSS/J(theta)/cost function )

It helps in observing whether our model is having the high bias or high variance problem.

Is there any package in R which can help in getting this plot?

Ekaba Bisong
  • 2,918
  • 2
  • 23
  • 38
Ak204
  • 63
  • 2
  • 8
  • Hi, please kindly show appreciation by upvoting and clicking the green arrow to pick an answer that was useful in any way. Thanks. – Ekaba Bisong Jul 24 '17 at 15:52

1 Answers1

2

You can make such a plot using the excellent Caret package. The section on Customizing the tuning process will be very helpful.

Also, you can check out the well written blog posts on R-Bloggers by Joseph Rickert. They are titled "Why Big Data? Learning Curves" and "Learning from Learning Curves".

UPDATE
I just did a post on this question Plot learning curves with caret package and R. I think my answer will be more useful to you. For convenience sake, I have reproduced the same answer here on plotting a learning curve with R. However, I used the popular caret package to train my model and get the RMSE error for the training and test set.

# set seed for reproducibility
set.seed(7)

# randomize mtcars
mtcars <- mtcars[sample(nrow(mtcars)),]

# split iris data into training and test sets
mtcarsIndex <- createDataPartition(mtcars$mpg, p = .625, list = F)
mtcarsTrain <- mtcars[mtcarsIndex,]
mtcarsTest <- mtcars[-mtcarsIndex,]

# create empty data frame 
learnCurve <- data.frame(m = integer(21),
                     trainRMSE = integer(21),
                     cvRMSE = integer(21))

# test data response feature
testY <- mtcarsTest$mpg

# Run algorithms using 10-fold cross validation with 3 repeats
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"

# loop over training examples
for (i in 3:21) {
    learnCurve$m[i] <- i

    # train learning algorithm with size i
    fit.lm <- train(mpg~., data=mtcarsTrain[1:i,], method="lm", metric=metric,
             preProc=c("center", "scale"), trControl=trainControl)        
    learnCurve$trainRMSE[i] <- fit.lm$results$RMSE

    # use trained parameters to predict on test data
    prediction <- predict(fit.lm, newdata = mtcarsTest[,-1])
    rmse <- postResample(prediction, testY)
    learnCurve$cvRMSE[i] <- rmse[1]
}

pdf("LinearRegressionLearningCurve.pdf", width = 7, height = 7, pointsize=12)

# plot learning curves of training set size vs. error measure
# for training set and test set
plot(log(learnCurve$trainRMSE),type = "o",col = "red", xlab = "Training set size",
          ylab = "Error (RMSE)", main = "Linear Model Learning Curve")
lines(log(learnCurve$cvRMSE), type = "o", col = "blue")
legend('topright', c("Train error", "Test error"), lty = c(1,1), lwd = c(2.5, 2.5),
       col = c("red", "blue"))

dev.off()

The output plot is as shown below:
MtCarsLearningCurve.png

Community
  • 1
  • 1
Ekaba Bisong
  • 2,918
  • 2
  • 23
  • 38