0

I'm using glmnet package in R for ridge regression. I tried on Hitters dataset from ISLR package. The problem is, when I use model.matrix to create the design matrix, the number of observations reduced for unknown reason. This is the code.

library(ISLR)
library(glmnet)

data("Hitters")

set.seed(1)
train=sample(1:nrow(Hitters), nrow(Hitters)/2)
test=(-train)

train.data = Hitters[train,]
test.data = Hitters[test,]
train.x=model.matrix(Salary~.,train.data)[,-1]
train.y=train.data$Salary

In the code, I'm trying to predict salary variable using all other variables. The train.data has 161 observations while train.x has 131. I don't understand why that would occur and would appreciate any help.

Ashley
  • 67
  • 1
  • 7

1 Answers1

1

You have NA values in the Salary field.

You can identify the problem like this:

missing.players <- setdiff(rownames(train.data), rownames(train.x))
train.data[missing.players, ]
Bulat
  • 6,869
  • 1
  • 29
  • 52