1

I have a model which contains a time trend variable for 7 years (so 2000=1,2001=2,...,2006=7) as well as having dummy variables for 6 of the years (so one binary variable for each of the years excluding 2000). When I ask R to fit this linear model:

olsmodel=lm(lnyield ~ lnx1+ lnx2+ lnx3+ lnx4+ lnx5+ x6+ x7+ x8+ timetrend+ 
                     yeardummy2001+ yeardummy2002+ yeardummy2003+ yeardummy2004+ 
                     yeardummy2005+ yeardummy2006)

I get NA's produced for the last dummy variable in the model summary. Along with the following "Coefficients: (1 not defined because of singularities)".

I do not know why this is happening as all of the x_i variables are continuous and no subset of the dummies and the time trend are a linear combination of each other.

Any help as to why this might be happening would be much appreciated!

IRTFM
  • 258,963
  • 21
  • 364
  • 487
user1836894
  • 293
  • 2
  • 5
  • 18
  • 1
    Beware of identifiablity when fitting a model with dummy variables. The degree of freedom for variable `years` is 7-1=6, so only 6 coef can be estimated. In R, the default way is to fix coef of the last category as 0. – liuminzhao Nov 19 '12 at 20:44
  • When fitting the dummy variables I did exclude one of the years. I fit dummies for (2001,...,2006) and did not include one for 2000. Is this what you meant? – user1836894 Nov 19 '12 at 20:58
  • sorry initially I misunderstood your question, and now I will post an answer for you. – liuminzhao Nov 19 '12 at 21:13

1 Answers1

1

The problem is when you set the year trend to be 1:n, and also include dummy variable for each year, it happens to produce a non-full-column-rank covariates matrix:

Say if there are only 3 categories: r1, r2, r3, the model is y ~ trend + c2 + c3 and the covariates matrix you will have is :

> mat
     int trend c2 c3
[1,]   1     1  0  0
[2,]   1     1  0  0
[3,]   1     2  1  0
[4,]   1     2  1  0
[5,]   1     3  0  1
[6,]   1     3  0  1

and you can find the column rank of covariates matrix mat is only 3 instead of the number of coefficients you need to estimate (4), i.e. t(mat)%*%mat is singular. That might cause the error.

liuminzhao
  • 2,385
  • 17
  • 28
  • Awesome, that makes sense. Thank you for taking the time to answer! :) If I could trouble you for one more thing, would you have any recommendations of how to change things so I can still keep all the dummies along with the trend variable? – user1836894 Nov 19 '12 at 21:36
  • From my personal experience, I don't recommend keep both. You may take the variable as either a numerical variable or a categorical variable, but not both. It depends on what you are interested in. For me, if I found there is obvious trend for year, I would take it as `1:n`. But keep them both could be confounded with each other. Another way is to take it as an ordinal variable, which is categorical data but with orders. – liuminzhao Nov 19 '12 at 22:04
  • Thanks for the feedback man. I appreciate you taking the time. I will most likely just keep the trend variable and remove the dummies. – user1836894 Nov 19 '12 at 22:32