10

I am a beginner in curve fitting and several posts on Stackoverflow really helped me.

I tried to fit a sine curve to my data using lm and nls but both methods show a strange fit as shown below. Could anyone point out where I went wrong. I would suspect something to do with time but could not get it right. My data can be accessed from here. plot

data <- read.table(file="900days.txt", header=TRUE, sep="")
time<-data$time
temperature<-data$temperature

#lm fitting
xc<-cos(2*pi*time/366)
xs<-sin(2*pi*time/366)
fit.lm<-lm(temperature~xc+xs)
summary(fit.lm)
plot(temp~time, data=data, xlim=c(1, 900))
par(new=TRUE)
plot(fit.lm$fitted, type="l", col="red", xlim=c(1, 900), pch=19, ann=FALSE, xaxt="n",
yaxt="n")

#nls fitting
fit.nls<-nls(temp~C+alpha*sin(W*time+phi),
   start=list(C=27.63415, alpha=27.886, W=0.0652, phi=14.9286))
summary(fit.nls)
plot(fit.nls$fitted, type="l", col="red", xlim=c(1, 900), pch=19, ann=FALSE, xaxt="n", 
axt="n")
rcs
  • 67,191
  • 22
  • 172
  • 153
Eddie
  • 783
  • 4
  • 12
  • 24

4 Answers4

14

This is because the NA values are removed from the data to be fit (and your data has quite a few of them); hence, when you plot fit.lm$fitted the plot method is interpreting the index of that series as the 'x' values to plot it against.

Try this [note how I've changed variable names to prevent conflicts with the functions time and data (read this post)]:

Data <- read.table(file="900days.txt", header=TRUE, sep="")
Time <- Data$time 
temperature <- Data$temperature

xc<-cos(2*pi*Time/366)
xs<-sin(2*pi*Time/366)
fit.lm <- lm(temperature~xc+xs)

# access the fitted series (for plotting)
fit <- fitted(fit.lm)  

# find predictions for original time series
pred <- predict(fit.lm, newdata=data.frame(Time=Time))    

plot(temperature ~ Time, data= Data, xlim=c(1, 900))
lines(fit, col="red")
lines(Time, pred, col="blue")

This gives me:

enter image description here

Which is probably what you were hoping for.

Community
  • 1
  • 1
Andy Barbour
  • 8,783
  • 1
  • 24
  • 35
  • I just wondering how to refine so that the curve actually fit the maximum temperature values. – Eddie Nov 21 '13 at 17:22
  • @Eddie I'm not sure what you mean. Can you clarify? – Andy Barbour Nov 21 '13 at 18:04
  • 1
    Thanks again @Andy Barbour, I was hoping that the blue curve is running through some of the maximum values(e.g at 29.5 degree Celcius at the first peak). At the moment, the highest peak for the predicition(blue curve) is somewhere around 28.8 degree Celcius. I am not sure how to refine this in lm. For nls, I think I can achieve that by changing my initial parameters value(which is also a tricky part). :) – Eddie Nov 21 '13 at 19:21
  • Then I think you'd want to use the `'weights'` field; but, you may wish to ask over at http://stats.stackexchange.com/ – Andy Barbour Nov 22 '13 at 01:57
  • This only works when one period = 1 unit of t, is it possible to adjust it if one period is not 1 unit of t. Or is there another way to do this? I'm trying to fir a curve to this data: `t<-c(1,16,32,46,60,75,91,105, 121, 136, 152, 166, 182, 197, 213, 228, 244, 258, 274,289, 305, 319, 335, 350) y<-c(0.07675029,0.06820963,0.05715293,0.05327944,0.05299214,0.05741750,0.07288807,0.11433698,0.26188523,0.35214251,0.46357182,0.55140384,0.59743720,0.60758298,0.59527024,0.55041092,0.46916025,0.39096330,0.27733599,0.20236358,0.12148088,0.08689101,0.08020721,0.07943709)` – Liza Oct 06 '16 at 19:06
4

How about choosing an X and an Y while doing your line plot instead of just choosing the Y.

plot(time,predict(fit.nls),type="l", col="red", xlim=c(1, 900), pch=19, ann=FALSE, xaxt="n",
yaxt="n")

Also both lm and nls just give you the fitted points. So you must estimate the rest of the points in order to make a curve, a line plot. Since you are with nls and lm, perhaps the function predict maybe useful.

Usobi
  • 1,816
  • 4
  • 18
  • 25
1

Not sure if this might help - I get a similar fit using sine only:

y = amplitude * sin(pi * (x - center) / width) + Offset

amplitude =  2.0009690806953033E+00
center = -2.5813588834888215E+01
width =  1.8077550471975817E+02
Offset =  2.6872265116104828E+01

Fitting target of lowest sum of squared absolute error = 3.6755174406241423E+01

Degrees of freedom (error): 90
Degrees of freedom (regression): 3
Chi-squared: 36.7551744062
R-squared: 0.816419142696
R-squared adjusted: 0.810299780786
Model F-statistic: 133.415731033
Model F-statistic p-value: 1.11022302463e-16
Model log-likelihood: -89.2464811027
AIC: 1.98396768304
BIC: 2.09219299292
Root Mean Squared Error (RMSE): 0.625309918107

amplitude = 2.0009690806953033E+00
       std err squared: 1.03828E-02
       t-stat: 1.96374E+01
       p-stat: 0.00000E+00
       95% confidence intervals: [1.79853E+00, 2.20340E+00]
center = -2.5813588834888215E+01
       std err squared: 2.98349E+01
       t-stat: -4.72592E+00
       p-stat: 8.41245E-06
       95% confidence intervals: [-3.66651E+01, -1.49621E+01]
width = 1.8077550471975817E+02
       std err squared: 3.54835E+00
       t-stat: 9.59680E+01
       p-stat: 0.00000E+00
       95% confidence intervals: [1.77033E+02, 1.84518E+02]
Offset = 2.6872265116104828E+01
       std err squared: 5.15458E-03
       t-stat: 3.74289E+02
       p-stat: 0.00000E+00
       95% confidence intervals: [2.67296E+01, 2.70149E+01]

Coefficient Covariance Matrix
[ 0.02542366 0.01786683 -0.05016085 -0.00652111]
[ 1.78668314e-02 7.30548346e+01 -2.18160818e+01 1.24965136e-01]
[ -5.01608451e-02 -2.18160818e+01 8.68860810e+00 -1.27401806e-02]
[-0.00652111 0.12496514 -0.01274018 0.0126217 ]

James Phillips zunzun@zunzun.com

James Phillips
  • 4,526
  • 3
  • 13
  • 11
  • 1
    Thanks @James Phillips. Thats true. You might want to check this post http://stats.stackexchange.com/questions/60500/how-to-find-a-good-fit-for-semi-sinusoidal-model-in-r – Eddie Nov 27 '13 at 14:37
0

Alternatively, you could have eliminated the NAs from your data after reading it in:

data <- subset(data, !is.na(temperature))

Then, when plotting, you could set the x-axis to the time points from the reduced data set:

plot(temp~time, data=data, xlim=c(1, 900))
lines(x=time, y=fit.lm$fitted, col="red")

This curve won't be as smooth as the one produced by @andy-barbour but it will work in a pinch.

Hip Hop Physician
  • 252
  • 1
  • 3
  • 12