Using R's nls, how to define the coefficients' individual formulae / constraints?

Question

I've got a model to be estimated through the non-linear least squares method. The model is specified in such a way that I have a main formula having 5 coefficients which in turn have their own formulas dependent on total of four coefficients. The first formula itself is linear, it is the inclusion of the coefficients' individual formulae that makes the model nonlinear.

The R's nls function asks for a formula without the coefficients. I have no idea how to define them then.

To make things crystal clear, I'll post the concrete formulas below. I'll use '_' for index and "{}" brackets for the content of such index if it involves any algebra. Syntax is very much TEX-like, variable names are just single characters, so I didn't bother placing "*" everywhere where multiplication occurs.

p_t = β_1p_{t-1} + β_2p_{t-2} + β_3d_{t-1} + β_4d_{t-2} + β_5d_{t-3} + γ

β_1 = 2 - b - c 
β_2 = -(1-c)(1-b) 
β_3 = cδ + bα 
β_4 = -cδ(1-b) - bα(2-c) 
β_5 = bα(1-c)

@gung Before I asked my question I did a quick search through the site and found that similar questions are posted and answered here (e.g. http://stats.stackexchange.com/questions/21565/how-do-i-fit-a-constrained-regression-in-r-so-that-coefficients-total-1 ). But I am new here so I don't know the exact post preferences, please excuse me if I am wrong. That's quite a practical issue, I thought it would be best to hear an answer from practicing statisticians. — infoholic_anonymous, Aug 16 '12 at 03:58
@Roland yes, sorry, before posting I didn't know this is forbidden. The question hasn't been answered though neither here nor on Stack Overflow. — infoholic_anonymous, Aug 16 '12 at 13:48
As I already commented on SO: You could wrap the equation system in a function. But this function would be recursive and I don't know if `nls` can handle recursive functions. Also, what are the values of $p_0$ and $p_1$? Do you have some example data for testing? — Roland, Aug 16 '12 at 15:07
@Roland These are natural logarithms of share prices at the beginning of year t and sum of dividends distributed during year t. The number indexes mean magnitudes of lag operators. Here is a data sample, however yet to be turned into logarithms: http://pastebin.com/hvYsD3Fx Thanks in advance! — infoholic_anonymous, Aug 22 '12 at 16:35

score 4 · Answer 1 · answered Aug 16 '12 at 04:21

4

What do you mean by "The R's nls function asks for a formula without the coefficients"! In R, the coefficients will be estimated by the nls. I think you can still do this by using nls. Write your formula in terms of b, c, $\alpha$, $\delta$ with some starting points for them using list.

answered Aug 16 '12 at 04:21

Stat

671
7
19

I added \$'s to make your Greek letters display correctly; I hope you don't mind. Just so you know, CV can use mathjax to render math equations etc. It works pretty much as you'd expect. There's also a "?" in the upper right corner that will help you navigate to more info, or you can ask about it on meta. – gung - Reinstate Monica Aug 16 '12 at 04:27
TO THOSE VOTING UP : I don't see how would such rewriting help. I might however misunderstand the R formula syntax, in which the model is to be passed to the nls. Please post the exact call for nls which you would suggest, that would prove this indeed is the solution, what I doubt. – infoholic_anonymous Aug 22 '12 at 16:39
to anonymous_infoholic: This is basically what @Roland did; see how he now has the function in terms of the desired parameters? You could perhaps also do this rewriting algebraically instead of with code. – Aaron left Stack Overflow Aug 22 '12 at 18:57

score 4 · Answer 2 · edited Jun 20 '20 at 09:12

4

I show you the principle. I am not sure I used the correct equations, but since you wrote about 4 parameters, I think this is what you mean.

I did not check for errors and leave that to you. You should also do the usual diagnostics that should be used when doing non-linear regression.

fitfun<-function(p,d,a,b,c,d1){    
  beta_1 <- 2-b-c 
  beta_2 <- -(1-c)*(1-b) 
  beta_3 <- c*d1 + b*a 
  beta_4 <- -c*d1*(1-b) - b*a*(2-c) 
  beta_5 <- b*a*(1-c) 
  res <- as.numeric(filter(p,c(0,beta_1,beta_2),sides=1)) + #use filter to calculate lagged values
         as.numeric(filter(d,c(0,beta_3,beta_4,beta_5),sides=1)) + c
  res[-(1:3)] #to remove NAs
  }

pfit<-df$p0[-(1:3)]
fit<-nls(pfit ~ fitfun(p0,d1,a,b,c,d),data=df,start=list(a=2,b=1,c=1,d=1))
summary(fit)

 Formula: pfit ~ fitfun(p0, d1, a, b, c, d)

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
a  0.13858    0.62064   0.223    0.823    
b  0.21101    0.01594  13.237   <2e-16 ***
c  1.41492    0.03015  46.929   <2e-16 ***
d  0.09838    0.09098   1.081    0.280    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.799 on 1043 degrees of freedom

Number of iterations to convergence: 7 
Achieved convergence tolerance: 5.107e-06

Edit

I show quickly how filter works.

x <- 1:10
#[1]  1  2  3  4  5  6  7  8  9 10
filter(x,c(0,1),sides=1) #1*x_{i-1}
#[1] NA  1  2  3  4  5  6  7  8  9
filter(x,c(0,0,2),sides=1) #2*x_{i-2}
#[1] NA NA  2  4  6  8 10 12 14 16
filter(x,c(1,2,3),sides=1) #1*x_i + 2*x_{i-1} + 3*x_{i-2}
#[1] NA NA 10 16 22 28 34 40 46 52

filter is a time series function and thus pretty fast.

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 22 '12 at 17:53

Roland

127,288
10
191
288

I don't want to sound retarded, but in which line is this: `p_t = β_1*p_{t-1} + β_2*p_{t-2} + β_3*d_{t-1} + β_4*d_{t-2} + β_5*d_{t-3} + γ` included? Especially, which beta coefficient belongs to which variable among lagged p-s and d-s? Please forgive me, I don't have much experience in R apart from doing some very basic stuff. If you were as kind as to comment the code assuming I'm a total moron, I'd be very thankful. – infoholic_anonymous Aug 22 '12 at 21:27
1

It's the line where `res` is defined. I suggest you try to understand how `filter` works, e.g., by looking at the help page or by trying some simpler uses of this function. Of course you could rewrite `fitfun` without `filter`, but then you would miss out on learning this very useful function. – Roland Aug 23 '12 at 06:59
PS: I am not sure the p-values are correct since you are working with a time series and the assumption of independence is probably violated. But that is a stats question which would belong on Cross Validated. There might be better (and more correct) possibilities than using `nls` for this. – Roland Aug 23 '12 at 07:16
To be honest I don't understand how filter works at all. I've tried to read the help but I just don't get it. Do you happen to know where can I read about that in an even more basic way? | The model I provided was conceived by Gregory Chow and he suggested to optimize it this way in his paper. – infoholic_anonymous Sep 13 '12 at 00:21
Oh, now I see. Thank you very much. Finally, why is there + c at the end of the res formula? – infoholic_anonymous Sep 17 '12 at 00:08
and I'm not sure at all if it should be equal to c (in the beta definitions) or not. – Roland Sep 17 '12 at 07:24
It won't be equal. It is defined as b*c multiplied by a constant. How should I include it explicitly in my model function then? – infoholic_anonymous Sep 17 '12 at 12:29
Your comment demonstrates that you do not understand the code. You should not use it without understanding it, because I give no warranty for correctness. – Roland Sep 17 '12 at 13:08
I do. I understand your formulation of coefficients and model itself, including the filtering. My only doubt is about the residual. It definitely doesn't equal to c as you suggested. Quoting the paper I use, it is equal to the residual of the model expressed in terms of a,b,c,d multiplied by b*c. It is the (2)nd formula vs the (3)rd one [here](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.197.511&rep=rep1&type=pdf). I understand you may be already annoyed and really appreciate your help, but please do guide me through this very last step. – infoholic_anonymous Sep 17 '12 at 17:05
I am sorry, but I won't try to understand a paper on economics (in my spare time). Until now I assumed gamma to be a constant. Now you say it is the residuals? If so, what is the distribution of gamma? – Roland Sep 18 '12 at 07:29
It's plain math, nothing to be read aside from those formulas. If model is expressed in (\alpha,b,c,\delta) the remaining residual is \gamma_0. In the final form of the model, i.e. expressed in (\beta1, \beta2, \beta3, \beta4, \beta5), the residual is \gamma, which is b*c*\gamma_0. – infoholic_anonymous Sep 19 '12 at 23:15
I don't think, you mean the same by "residual" as I do. However, I'm sure you will be able to figure out how to adjust the code to your needs. – Roland Sep 20 '12 at 07:02

Using R's nls, how to define the coefficients' individual formulae / constraints?

2 Answers2

Edit