So you are manually creating three dummy columns in excel, and want to import them into R? If you later import these columns as numeric rather than factor, there will be no problem.
Well, I still have to remind you that R can code factor to dummy variables, via model.matrix()
. So there is never the need to do this thing yourself. It is definitely OK to use a single column with "red", "blue" and "yellow" in excel, and export it into R as factor.
colour <- gl(3,2,labels=c("red","blue","yellow"))
model.matrix(~ colour - 1)
# colourred colourblue colouryellow
#1 1 0 0
#2 1 0 0
#3 0 1 0
#4 0 1 0
#5 0 0 1
#6 0 0 1
Just another quick question. Using the model.matrix
for factor colour and other factor variables - how can I incorporate this into my model? When I call a linear model (for example) lm(response ~ predictor.1 + predictor.2 + colour)
will it automatically call the dummy variables or do I need to assign the model.matrix to a vector?
model.matrix
is a service routine, for model fitting routines like lm
, glm
, etc. User can simply use a formula, then model matrix will be constructed behind the scene. So, you don't even need to obtain a model matrix yourself.
For an advanced user, sometimes he may want to use the internal fitting routines lm.fit
or even .lm.fit
. Read ?lm.fit
for those routines. These routines do not accept a model formula, but a model matrix X
and a response vector y
. In such situation, user is fully responsible to generate X
and y
himself.