getting least squares and residuals by comparing data

Question

I have a set of simulated data (df1) I've generated. I have a second set of data (df2) that I would like to compare and see if df1 can explain the observations of df2.

Ideally I'd like to plot the residuals and calculate least squares but I am not sure how to do this when I am comparing one set of data with another.

df1
      time n
 0.0000000 1
 0.1268725 2
 1.3128176 3
 3.1765056 4
 3.4091914 5
 4.1245285 6
 8.4518769 9
 9.8119399 10

df2
  n  time
  0  0
 37  1
 97  2
157  3

What does "explain the observations of" mean exactly? What type of analysis do you wish to do? — MrFlick, Jul 21 '14 at 22:13
I would like to know if the model that I've made to generate the simulated data can explain the df2 — user3141121, Jul 21 '14 at 22:15
I want to do model fitting of df2 to df1 and I want to get residuals and least squares just like you would from lm(). — user3141121, Jul 21 '14 at 22:23
You can't just "fit" data from one set to another. That doesn't make sense. Do you want to fit a linear model to df1 and then use that fit to calculate residuals form df2? That's a bit more precise. You need a clear modeling strategy. — MrFlick, Jul 21 '14 at 22:27
It doesn't make sense to do "least squares" on two data sets. There has to be some model parameter that you're varying in order to minimize the sum of squares of the residuals. — Jim Lewis, Jul 21 '14 at 22:27
Thanks for the clarification. When I fit a linear model to df1 and then use df2 to calculate residuals the residuals do not look good and I get this warning: _Warning message: 'newdata' had 4 rows but variables found have 10 rows_ I am now using `predict.lm()` — user3141121, Jul 21 '14 at 22:55

jlhoward · Answer 1 · 2014-07-22T14:42:37.460

So you seem to be asking: does the fit generated using the training set (df1), do well on the test set (df2). Here's one way to get at this:

fit <- lm(n ~ time, df)
par(mfrow=c(1,2))
with(df,plot(time,n))
with(df,lines(time,predict(fit),col="blue",lty=2))
plot(fit,1)

df2$pred <- predict(fit,df2)
df2$resid <- with(df2,n-pred)
with(df2,plot(time,n))
with(df2,lines(time,pred,col="blue",lty=2))
with(df2,plot(n,resid,type="b"))

So the answer is "no", the fit does not explain the data in df2 well. Values of n predicted by the model are much lower than the "actual" values of n.

getting least squares and residuals by comparing data

1 Answers1