Using nls in R to re-create research

Question

I am learning how to use the nls function in R and am having some issues. I am simply attempting to recreate the curve found in a research paper for now. The model fits a curve to the the stock market movements prior to the 1987 crash.

I have defined a function, func, as follows:

func <- function(a,b,tc,t){
 a+b*log(tc-t)
}

I have called nls like this:

nls1 <- nls(Y ~ func(a,b,tc,t), data2, start=list(a=0, b=1, tc=1466, t=1))

data2 is a data frame that consists of two columns, one is the date, the other is a value. There are 1466 rows.

head(data2)
 Date      Y
1  1/4/82 882.52
2  1/5/82 865.30
3  1/6/82 861.02
4  1/7/82 861.78
5  1/8/82 866.53
6 1/11/82 850.46

I get the following messages when I run nls,

Error in qr(.swts * attr(rhs, "gradient")) : 
  dims [product 4] do not match the length of object [1466]

In addition: Warning message:

In .swts * attr(rhs, "gradient") :
  longer object length is not a multiple of shorter object length

From what I can gather, this is a problem with the way the data frame is set up but I cannot find a solution.

Any idea how I can move this father along?

Thank you very much for your help.

jlhoward · Answer 1 · 2013-12-30T22:08:31.523

The basic problem is that you have not specified an independent variable. By specifying start(...) for a, b, tc, and t, you are telling nls(...) that these are all parameters of the model.

It looks like you are using a simplified version of the LPPL model, in which a, b, and tc are parameters, and t is the independent variable. It looks like data2$Date contains the time variable. You need to make sure the data2$Date is of class POSIXct. So you could write:

df$Date <- as.POSIXct(df$Date, format="%m/%d/%y")
nls1 <- nls(Y~a+b*log(tc-Date), data=data2, start=list(a=0, b=1, tc=1466))

EDIT: In response to OP's comments

This is a great question, because it illustrates several issues in using nls(...). The problem you are having (now that the model is specified correctly) is that nls(...) is not converging - a distressingly common occurrence. Basically, unless your starting parameter estimates are relatively close to the final, fitted values (or unless the model is extremely "well behaved"), nls will fail. [Note also that the paper you cite mentions that b is restricted to b < 0, whereas you are starting with b=1.] So what to do?

The minpack.lm(...) function in package minpack uses the exceptionally robust Levenberg-Marquardt algorithm for non-linear least squares estimation. In fact, the paper you cite mentions L-M specifically. The problem with minpack.lm(...) is that it is much more difficult to use (you have to define a function which returns the residuals at a given step, rather than just defining the function to fit). Plus, minpack.lm(...) does not calculate statistics of the fit.

So the solution is to use them both! Use minpack.lm(...) to estimate the parameters, then use those as "starting values" in nls(...). The code below does that. Having a model fitted using nls(...) will make it much easier to generate statistics of the fit, predicted values, residuals, and also to apply the model to new datasets.

# this section just grabs the DJIA for 1982 - 1987; you already have this
library(tseries)
library(zoo)
ts <- get.hist.quote(instrument="DJIA", 
                     start="1982-01-01", end="1987-08-01", 
                     quote="Close", provider="yahoo", origin="1970-01-01",
                     compression="d", retclass="zoo")
df <- data.frame(ts)
df <- data.frame(Date=as.Date(rownames(df)),Y=df$Close)
df <- df[!is.na(df$Y),]
# end of setup...
library(minpack.lm) # for nls.lm(...)
library(ggplot2)    # for ggplot
df$days <- as.numeric(df$Date - df[1,]$Date)
# model based on a list of parameters
f <- function(pars, xx) {pars$a + pars$b*log(pars$tc - xx)} 
# residual function
resids <- function(p, observed, xx) {df$Y - f(p,xx)}
# fit using Levenberg-Marquardt algorithm
nls.out <- nls.lm(par=list(a=1,b=-1,tc=5000), fn = resids, 
                  observed = df$Y, xx = df$days)
# use output of L-M algorithm as starting estimates in nls(...)
par <- nls.out$par
nls.final <- nls(Y~a+b*log(tc-days),data=df, 
                 start=c(a=par$a, b=par$b, tc=par$tc))
summary(nls.final)      # display statistics of the fit 
# append fitted values to df
df$pred <- predict(nls.final)
# plot the results
ggplot(df)+
  geom_line(aes(x=Date,y=Y),color="black")+
  geom_line(aes(x=Date,y=pred),color="blue",linetype=2)+
  labs(title="LPPL Model Applied to DJIA (1982 - 1987)",
       x="", y="DJIA (daily close)")+
  theme(plot.title=element_text(face="bold"))

Excellent suggestion, thank you. It has moved me along. When I run the nls1 line, I get the following error, Error in `-.POSIXt`(tc, Date) : can only subtract from "POSIXt" objects. I imagine this has to do with the fact that tc is an integer, not a date. — mks212, Dec 30 '13 at 15:16
Regarding the methodology and why this particular equation was chosen, here is the academic article I am working off of that explains the rationale, http://www.chronostraders.com/wp-content/uploads/2013/08/LP-clusters.pdf — mks212, Dec 30 '13 at 15:29
@user2926358 - see my edits above re: why are you still getting errors. — jlhoward, Dec 30 '13 at 21:53
It's funny how this works sometimes. I have spent a good portion of the day research LPPL modeling and see the issues that you raised. I initially was going to try the Levenberg-Marquardt algorithm you mentioned, but had issues installing minpack so decided to try to keep it simple. With all of th reading I have done today, it is clear that using LPPL is a complex problem. Thank you for your comments, I will begin working with it shortly. — mks212, Dec 30 '13 at 22:23
One quick note on the code above, when I first opened R a few months ago, the course I was taking mentioned !is.na. I found this confusing as a newcomer, meaning I couldn't get it to work, and instead used na.omit. It seems more straightforward to me. — mks212, Dec 30 '13 at 22:28
Alright, I think I need to call it a day. I did get to the end of this using S&P 500 data. Dow Jones data is not available to be downloaded from Yahoo, some sort of licensing issue. I will look at this some more tomorrow and substitute in the Dow Jones data I downloaded. Thanks for your help. — mks212, Dec 31 '13 at 01:15
Note that `nls` will work if you ensure that the argument to log cannot go negative and use the `"plinear"` algorithm which only requires starting values for the non-linear parameters (in this case `tc`). `nls(Y ~ cbind(1, log(abs(tc - days))), df, start = c(tc = 5000), alg = "plinear")` — G. Grothendieck, Mar 10 '17 at 14:38

score 1 · Answer 2 · answered Dec 30 '13 at 06:26

Normally, when one performs a least squares regression, the assumption is that there is a so-called "dependent" or "response" variable (Y, in your case), which is a function of one or more "independent" or "predictor" variables (Date), and typically the detailed specification of the predictor function itself is usually defined by a fairly small number of static parameters (a and b, and possibly also t and/or tc as well, depending on what exactly it is that you are trying to achieve). The job of the nls() function is to find the optimal values for those static parameters which would lead to the most accurate possible prediction.

The input to your predictor function func appears to be missing the required independent variable. So, I think you probably need to do one of two things. Either you modify func so that it accepts Date as an input, or else you change the Date column label in your data frame so that the name matches one of the func inputs (most likely I suspect that you would want to rename the Date column so that it corresponds to tc). In either case, if you want to perform a calculation in which you subtract a date value in the data frame from a fixed offset date (e.g., (tc - t) as it appears to be written right now), you will need to check that R is actually recognizing your dates as Date objects and not as strings, so that it knows how to meaningfully subtract one from the other. The as.Date() function may be helpful to you for this purpose.

As an additional alternative, rather than trying to rewrite func so that it accepts R Date objects as inputs, you may find it simpler just to reassign the Date column in the data frame to an elapsed integer number of days with reference to some offset; e.g., do something like:

data2$tc <- as.numeric(as.Date(data2$Date) - as.Date("1982-1-4"))

or similar.

This is very helpful, thank you. I went with your 3rd option as that is the simplest. I did away with func altogether and used this from jlhoward's suggestion above. nls1 <- nls(value~a+b*log(tc-t), data=data2, start=list(a=882, b=1, tc=1000)). I receveid the following error, Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model — mks212, Dec 30 '13 at 15:19

Using nls in R to re-create research

2 Answers2

Linked