0

My dataset is quite big so I'm just using 10 lines of data as an example (I've worked out the answer in excel but can't replicate it in R-as i need help with the code):

constant<-c(6.10,5.12,5.04,4.97,4.89,4.89,4.87,4.87,4.88,4.99)
years.star<-c(219.87,153.69,146.19,139.35,127.27,127.27,121.91,121.91,112.28,99.98)
years.sq.star<-c(7915.41,4610.71,4239.78,3901.93,3309.27,3309.27,3047.95,3047.95,2582.58,1999.62)
ln.salary<-c(28.43,23.12,21.59,21.44,22.71,23.33,20.29,21.76,21.48,22.92)

try<-data.frame(constant,years.star,years.sq.star,ln.salary)

Ln.salary is the dependant variable. The answer you should get is:

intercept-  6.474922
beta1-      -0.15026
beta2-      0.002769

My problem is that in R, if I use the lm function, it does not know that my intercept has the values above. it just uses 1,1,1,1,1,1,1,1,1,1 instead of 6.10,5.12,etc

So test<-lm(ln.salary~years.star+years.sq.star,data=try,weights=constant)

does not work because it will just generate this answer:

intercept-   207.1706
beta1-       -3.13214
beta2-        0.064416

In essence, I've taken data and tried to adjust for heteroscedasticity. In the final step, I have my constant star and my transformed x variables. The last step is to regress ln.salary on the constant and x variables to give me the answer you should get as per above.

I can do it in excel but not in R and I know I'm not getting the code right. I know the lm function which generates intercept (1,1,1...) is the problem. Please would you help.

Kind regards D

user3497385
  • 101
  • 1
  • 9

2 Answers2

1

If you want to "fix" an intercept at a particular constant, you should subtract the value of that constant from the response, and then fit a no-intercept model. For example

test <- lm( ln.salary - 6.474922 ~ years.star + years.sq.star + 0,
    data=try, weights=constant)

Here we subtract off the intercept term, and then we add +0 to the formula to indicate not to fit an intercept term. With that model I get

Call:
lm(formula = ln.salary - 6.474922 ~ years.star + years.sq.star + 
    0, data = try, weights = constant)

Coefficients:
   years.star  years.sq.star  
     0.197384      -0.002842  
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • +1. You actually don't need the `I(...)` if the expression is on the left side of `~` – Señor O Jun 04 '14 at 19:21
  • Thanks @SeñorO, I removed it. – MrFlick Jun 04 '14 at 19:22
  • Hi there Mr Flick. I've figured it out. We just need to say lm(ln.salary~constant+years.star+years.sq.star+0,data=try) and it works. You should get the answer above. What do you think? – user3497385 Jun 04 '14 at 19:34
  • I've dealt with a lot of Weighted Least Squares examples and then reproduced the correct answers in excel using covariance matrices etc. So I'm pretty confident on the answer. I think we have got it now. Try my latest code and see if you agree. Subtracting the intercept worked like a charm. Thanks man. – user3497385 Jun 04 '14 at 19:37
  • @user3497385 That's not a weighted OLS regression then, as the title suggests. Also beware that the model will attach a coefficient to `constant` the way you wrote it. – Señor O Jun 04 '14 at 19:43
  • I'd like to thank you for pointing out the + 0 and find your answer useful. thanks again buddy. – user3497385 Jun 04 '14 at 19:43
  • Well, perhaps I should put Feasible generalised least squares? I ran the original regression, got the residuals. Took the absolute values of the residuals and regressed these on the x variables. Then I took the predicted values from that regression and divided all the original x variables by this. The constant is 1 divided by the predicted values. These give you the *star transformed variables. Hope I'm making sense here. – user3497385 Jun 04 '14 at 19:45
  • OLS or Ordinary Least Squares is the best term. If you have "weights", that means you want the algorithm to make some points more important than others. – Señor O Jun 04 '14 at 19:47
  • The model needs an intercept. the 6.47 is the right answer. I'd be happy to send you the excel data and my formulas if you would like to see it. – user3497385 Jun 04 '14 at 19:49
  • Thank you Senor, I do appreciate your feedback. – user3497385 Jun 04 '14 at 19:50
0

If you want varying "intercepts" for each row, then you need to use an 'offset' rather than a 'weight':

 test<-lm(ln.salary~years.star+years.sq.star+0,data=try,offset=constant)

Call:
lm(formula = ln.salary ~ years.star + years.sq.star + 0, data = try, 
    offset = constant)

Coefficients:
   years.star  years.sq.star  
     0.236355      -0.003881  

I'm not so impressed with the fact that this doesn't agree with Excel. That program's linear regression program is known to be rather flakey. If on the other hand you are sure you need to use weights, then you should clarify which of the three different possible interpretations of the term is being used. (Choices: replication, sampling, inverse variance). The lm interpretation of a "weight" is the inverse variance version. (It is described in its help page as being "inversely proportional to the variance), so if those "constant"-terms are variances, then perhaps you want:

> (test<-lm(ln.salary~years.star+years.sq.star+0, data=try, weights=1/constant) )

Call:
lm(formula = ln.salary ~ years.star + years.sq.star + 0, data = try, 
    weights = 1/constant)

Coefficients:
   years.star  years.sq.star  
     0.309391      -0.005189  
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Hi there BondedDust. Thanks but I realized I don't need to use weights, offset. I just had to remove the intercept and add my own intercept (which was constant) and it worked. With regards to the excel, it's cumbersome and I agree with what you are saying but the formulas did give me the right answer. I'd be happy to share that with you if you wish. – user3497385 Jun 04 '14 at 20:01