1

I am trying to understand a R script and I came upon this line:

train <- cbind(train[,c(1,2)],model.matrix(~ -1 + .,train[,-c(1,2)]))

train is a data.frame. I think it is trying to combine the first two columns of train with all the other columns after they have been through some sort of matrix manipulation. However, I cannot understand exactly what the model formula(?) seems to be doing. From the comment in the script it's purpose is to turn all the other columns in to 0's and 1's, but I'm not sure how. If someone could clarify that would be great. Thanks!

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
IvonLiu
  • 233
  • 1
  • 4
  • 10
  • 3
    minus 1 removes the intercept, like + 0, in R formulas – Rorschach Jul 26 '15 at 04:45
  • 2
    Negative duplicate of [What does the R formula y~1 mean?](http://stackoverflow.com/questions/13366755/what-does-the-r-formula-y1-mean). – Molx Jul 26 '15 at 04:53

1 Answers1

4

From ?formula:

The - operator removes the specified terms... [i]t can also used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin.

Further:

There are two special interpretations of . in a formula. The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’

So, you have a formula specifying the response is a function of all variables in train[,-c(1,2)], with an intercept at the origin.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
jeremycg
  • 24,657
  • 5
  • 63
  • 74