0

by short: I'm trying to do a GMM Estimation by using the "pgmm" package in R. Reason for that is to research the impact of Corruption on Public debt. When I'm trying to regress the whole thing, I get "system is computationally singular" as an error.

The Variables:

debt <- dep. variable (Public Debt to GDP in %)

cpi <- ind. variable which I want to investigate (Corruption Perception Index)

edu <- ind. controle variable (Secondary School enrollment ratio)

pol <- ind. controle variable (Political stability Index)

exp <- ind. controle variable (Governmental Expenses)

gdp <- ind. controle variable (gdp/cap)

All Variables except of cpi and pol are in logs.

The Data contains those Indicators for around 120 Countries from a time period between 1998 and 2016. Observations with "NA" are removed which leaves 1232 Obersvations for the regression.

Im using an already existing paper as orientation for this model. As its my first dynamic panel model im kind of puzzled when it comes to the final regression.

The Paper quotes: "The difference equation is instrumented with the lagged levels, two periods, of the dependent variable and the levels equation with the difference lagged one period."

So I went with the following code (I'm actually not 100% sure if this is what the Authors meant by the quote above):

gmm <- pgmm(debt ~ lag(debt, 1:2) + cpi +lag(exp,0:1) + lag(pol,0:1) +
   lag(gdp,0:1) +lag(edu, 0:1) | lag(debt,2:99),
   data = data3,   effect = "twoways", model="twosteps")

recieving following Error:

    Error in solve.default(crossprod(WX, t(crossprod(WX, A1)))) : 
  system is computationally singular: reciprocal condition number = 9.6207e-21
In addition: Warning message:
In pgmm(debt ~ lag(debt, 1:2) + cpi + lag(exp, 0:1) + lag(pol, 0:1) +  :
  the first-step matrix is singular, a general inverse is used

I used the searching function for this problem and the high correlation between the variables often seemed to be the ostacle. Following table yields the correlation between the variables.

            debt        cpi         edu         gdp         exp         pol
debt  1.00000000 -0.1000317  0.06941532  0.01582022  0.15649933  0.03183785
cpi  -0.10003172  1.0000000 -0.54167403  0.03139960 -0.51025570 -0.78065946
edu   0.06941532 -0.5416740  1.00000000  0.04745409  0.38184303  0.49614498
gdp   0.01582022  0.0313996  0.04745409  1.00000000  0.02357436 -0.09799053
exp   0.15649933 -0.5102557  0.38184303  0.02357436  1.00000000  0.52357420
pol   0.03183785 -0.7806595  0.49614498 -0.09799053  0.52357420  1.00000000

There are indeed a few high values, so I tested the regression again leaving particular variables out but the warnings still appeared.

For the case, that the data at itself is problematic, you can see a few example lines of the csv.file:

"","country","year","debt","cpi","edu","gdp","exp","pol"
"3","Albania","2002",4.16044436392662,7.5,4.29374171980631,7.60190195987517,2.41323161308111,3.21
"4","Albania","2003",4.09767235231478,7.5,4.32585302986794,7.60240133566582,2.38784493694487,3.19
"5","Albania","2004",4.0517849478033,7.5,4.31988523813603,7.60290046220476,2.39607543608138,3.07
"6","Albania","2005",4.06388535473739,7.6,4.36054760299676,7.60339933974067,2.38508631450579,2.99
....
"1388","Yemen","2010",3.74714836223791,7.8,3.7716108517114,7.60589000105312,2.47232786758114,1.08
"1389","Yemen","2011",3.82209829790016,7.92,3.81793208202855,7.60638738977265,2.54944517092557,1.07
"1390","Yemen","2012",3.85651029549789,7.7,3.83449380291891,7.60688453121963,2.67138621673062,1.07
"1391","Yemen","2013",3.87535902105655,8.2,3.88424062441569,7.60738142563979,2.57184857992181,1.13
"1393","Zimbabwe","2012",3.81330703248899,8,3.83773040084629,7.60688453121963,3.17971910966701,2.72
"1394","Zimbabwe","2013",3.87743156065853,7.9,3.85248529271195,7.60738142563979,3.12588295801904,2.83

I am sorry if I am maybe not able to see some crucial mistakes but I'm struggeling with this problem for a few days now. GMM is a whole new territory for me and I would really appreciate some help :)

Greetings from a frustrated student

Community
  • 1
  • 1

2 Answers2

0

I have encountered this problem a bunch of times, as well. Although I do not fully understand GMM and how it works in plm, you could try modifying some aspects of the model. It could be some substantial things (variables you use, some interactions that may produce collinearity and then lead to computational failure) but sometimes minor alterations of how coefficients are computed helps a lot, too.

For example, you can change the matrix by setting another value to fsm argument inside pgmm. However, you should probably read more about how your manipulation may affect the results.

0

You are using two way estimators:

  1. The use of the time invariant variables can generate a singular matrix.
  2. I think that if you use time variant variables this problem is strongly reduced.
mrk
  • 8,059
  • 3
  • 56
  • 78
Luigi Biagini
  • 71
  • 1
  • 5