1

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.


PLAN:

To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.


PROBLEM:

In order to do so, I did the following:

all.cvs = rep(NA, 10)
for (i in 2:10) {
  lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
  all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}

But this gives an error:

Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( :    variable lengths differ (found for 'cut(Wage$age, i)')

Whereas, when I run the code given below, it runs.(It can be found here)

all.cvs = rep(NA, 10)

for (i in 2:10) {
  Wage$age.cut = cut(Wage$age, i)
  lm.fit = glm(wage~age.cut, data=Wage)
  all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}

Hypotheses and Results:

  1. Well, it might be possible that cut() and glm() might not work together. But this works:

    glm(wage~cut(age,4),data=Wage)
    

Question: So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.

So, why is the first version of the code not working?

This is confusing. Any help appreciated.

Community
  • 1
  • 1
Mooncrater
  • 4,146
  • 4
  • 33
  • 62
  • 1
    `cv.glm` needs the same variables given as input of the `glm` function. Using `glm(wage~cut(age,4),data=Wage)` you create a new variable inside `glm` that is not available in `Wage` – Marco Sandri Sep 25 '17 at 19:16
  • @MarcoSandri So, since `cv.glm()` has `Wage` as an argument, therefore `glm()` can have **only** `Wage`'s attributes as it's formula's arguments. Is that what you mean? – Mooncrater Sep 25 '17 at 19:22
  • 1
    Right ! `cv.glm` needs to "know" the variables used in the `glm` model. – Marco Sandri Sep 25 '17 at 20:00
  • https://stackoverflow.com/questions/42190337/ – Luce Apr 18 '18 at 03:47

0 Answers0