0

I have a df based on a national survey conducted every two years; the time period is 2010-14 and I filtered the df in order to have only person that appears al least two times. In this way I have a panel df but unbalanced.

I run a regression to study which variables influence the participation in complementary pension (it is voluntary in my country). I run a one-side fixed effect regression and now I want to run a two side fixed effect regression (both individual and time).

The individual variable is uid and time variable is year. I used the plm package in r:

df.p <- plm.data(df, c("uid", "year")

and run the regression:

reg1 <- plm(pens ~ woman + age + I(age^2/100) + high + medium + nord + centre, model="within", effect="twoways", data=df.p)

where high and medium are dummies regarding the education level and nord and centre regard geographic location. For the sake of simplicity I omitted other variables that are present in the original model (20 variables).

After at least 1 hour of working I run the summary command:

summary(reg1)

after another hour of working I got the error:

Error in crossprod(t(X), beta) : non-conformable arguments

so I supposed there was a multicollinearity problem. So I check the multicollinearity with the correlation matrix:

p1 <- with(df, data.frame(woman=woman, age=age, high=high, medium=medium, nord=nord, centre=centre))

round(cor(p1),3)

Consider that I created the matrix using all the variables (here omitted for the sake of simplicity, as I wrote). I didn't find any relevant value. I also check for the variance inflation factor:

vif(p1)

and I got:

No variable from the 20 input variables has collinearity problem. 

At this point I suppose the the collinearity problem could be determined by the fact that I run a two side regression but I don't know how to manage the problem.

Thanks in advance.

Helix123
  • 3,502
  • 2
  • 16
  • 36
Laura R.
  • 99
  • 1
  • 10
  • Have you tried running your analysis on a subset of your data? Perhaps try it with 100 subjects. This should let it run faster. Also, are the data public? If so, sharing a snippet of the data (via `dput`) would help to diagnose the problem – Benjamin Nov 03 '16 at 10:56
  • @Benjamin I tried with less observatoins as you said but the error is still the same; the data are public but unfortunately they are divided in several tables and I created a df on the base of the variables needed. – Laura R. Nov 03 '16 at 15:18

0 Answers0