-1

I have a set of simulated data (df1) I've generated. I have a second set of data (df2) that I would like to compare and see if df1 can explain the observations of df2.

Ideally I'd like to plot the residuals and calculate least squares but I am not sure how to do this when I am comparing one set of data with another.

df1
      time n
 0.0000000 1
 0.1268725 2
 1.3128176 3
 3.1765056 4
 3.4091914 5
 4.1245285 6
 8.4518769 9
 9.8119399 10

df2
  n  time
  0  0
 37  1
 97  2
157  3
user3141121
  • 480
  • 3
  • 8
  • 17
  • What does "explain the observations of" mean exactly? What type of analysis do you wish to do? – MrFlick Jul 21 '14 at 22:13
  • I would like to know if the model that I've made to generate the simulated data can explain the df2 – user3141121 Jul 21 '14 at 22:15
  • Explain it how? What does that mean mathematically? – MrFlick Jul 21 '14 at 22:16
  • I want to do model fitting of df2 to df1 and I want to get residuals and least squares just like you would from lm(). – user3141121 Jul 21 '14 at 22:23
  • 2
    You can't just "fit" data from one set to another. That doesn't make sense. Do you want to fit a linear model to df1 and then use that fit to calculate residuals form df2? That's a bit more precise. You need a clear modeling strategy. – MrFlick Jul 21 '14 at 22:27
  • It doesn't make sense to do "least squares" on two data sets. There has to be some model parameter that you're varying in order to minimize the sum of squares of the residuals. – Jim Lewis Jul 21 '14 at 22:27
  • Thanks for the clarification. When I fit a linear model to df1 and then use df2 to calculate residuals the residuals do not look good and I get this warning: _Warning message: 'newdata' had 4 rows but variables found have 10 rows_ I am now using `predict.lm()` – user3141121 Jul 21 '14 at 22:55

1 Answers1

0

So you seem to be asking: does the fit generated using the training set (df1), do well on the test set (df2). Here's one way to get at this:

fit <- lm(n ~ time, df)
par(mfrow=c(1,2))
with(df,plot(time,n))
with(df,lines(time,predict(fit),col="blue",lty=2))
plot(fit,1)

df2$pred <- predict(fit,df2)
df2$resid <- with(df2,n-pred)
with(df2,plot(time,n))
with(df2,lines(time,pred,col="blue",lty=2))
with(df2,plot(n,resid,type="b"))

So the answer is "no", the fit does not explain the data in df2 well. Values of n predicted by the model are much lower than the "actual" values of n.

jlhoward
  • 58,004
  • 7
  • 97
  • 140