let me directly dive into an example to show my problem:
rm(list=ls())
n <- 100
df <- data.frame(y=rnorm(n), x1=rnorm(n), x2=rnorm(n) )
fm <- lm(y ~ x1 + poly(x2, 2), data=df)
Now, I would like to have a look at the previously used data. This is almost available by using
temp.data <- fm$model
However, x2
will have been split up into poly(x2,2)
, which will itself be a dataframe as it contains a value for x2
and x2^2
. Note that it may seem as if x2
is contained here, but since the polynomal uses orthogonal components, temp.data$x2
is not the same as df$x2
. This can also be seen if you compare the variables visually after, say, the following: new.dat <- cbind(df, fm$model)
.
Now, to some questions:
First, and most importantly, is there a way to retrieve x2
from the lm-object in its original form. Or more generally, if some function f
has been applied to some variable in the lm-formula, can the underlying variables be extracted from the lm-object (without doing case-specific math)? Note that I know I could retrieve the data by other means, but I wonder if I can extract it from the lm-object itself.
Second, on a more general note, since I did explicitly not ask for model.matrix(fm)
, why do I get data that has been manipulated? What is the underlying philosophy behind that? Does anyone know?
Third, the command head(new.dat)
shows me that x2
has been split up in two components. What I see when I type View(new.dat)
is, however, only one column. This strikes me as puzzling and mindboggling. How can two colums be represented as one, and why is there a difference between head
and View
? If anyone can explain, I would be highly indebted!
If these questions are too basic, please apologize. In this case, I would appreciate any pointers to relevant manuals where this is explained.
Thanks in advance!