by short: I'm trying to do a GMM Estimation by using the "pgmm" package in R. Reason for that is to research the impact of Corruption on Public debt. When I'm trying to regress the whole thing, I get "system is computationally singular" as an error.
The Variables:
debt <- dep. variable (Public Debt to GDP in %)
cpi <- ind. variable which I want to investigate (Corruption Perception Index)
edu <- ind. controle variable (Secondary School enrollment ratio)
pol <- ind. controle variable (Political stability Index)
exp <- ind. controle variable (Governmental Expenses)
gdp <- ind. controle variable (gdp/cap)
All Variables except of cpi and pol are in logs.
The Data contains those Indicators for around 120 Countries from a time period between 1998 and 2016. Observations with "NA" are removed which leaves 1232 Obersvations for the regression.
Im using an already existing paper as orientation for this model. As its my first dynamic panel model im kind of puzzled when it comes to the final regression.
The Paper quotes: "The difference equation is instrumented with the lagged levels, two periods, of the dependent variable and the levels equation with the difference lagged one period."
So I went with the following code (I'm actually not 100% sure if this is what the Authors meant by the quote above):
gmm <- pgmm(debt ~ lag(debt, 1:2) + cpi +lag(exp,0:1) + lag(pol,0:1) +
lag(gdp,0:1) +lag(edu, 0:1) | lag(debt,2:99),
data = data3, effect = "twoways", model="twosteps")
recieving following Error:
Error in solve.default(crossprod(WX, t(crossprod(WX, A1)))) :
system is computationally singular: reciprocal condition number = 9.6207e-21
In addition: Warning message:
In pgmm(debt ~ lag(debt, 1:2) + cpi + lag(exp, 0:1) + lag(pol, 0:1) + :
the first-step matrix is singular, a general inverse is used
I used the searching function for this problem and the high correlation between the variables often seemed to be the ostacle. Following table yields the correlation between the variables.
debt cpi edu gdp exp pol
debt 1.00000000 -0.1000317 0.06941532 0.01582022 0.15649933 0.03183785
cpi -0.10003172 1.0000000 -0.54167403 0.03139960 -0.51025570 -0.78065946
edu 0.06941532 -0.5416740 1.00000000 0.04745409 0.38184303 0.49614498
gdp 0.01582022 0.0313996 0.04745409 1.00000000 0.02357436 -0.09799053
exp 0.15649933 -0.5102557 0.38184303 0.02357436 1.00000000 0.52357420
pol 0.03183785 -0.7806595 0.49614498 -0.09799053 0.52357420 1.00000000
There are indeed a few high values, so I tested the regression again leaving particular variables out but the warnings still appeared.
For the case, that the data at itself is problematic, you can see a few example lines of the csv.file:
"","country","year","debt","cpi","edu","gdp","exp","pol"
"3","Albania","2002",4.16044436392662,7.5,4.29374171980631,7.60190195987517,2.41323161308111,3.21
"4","Albania","2003",4.09767235231478,7.5,4.32585302986794,7.60240133566582,2.38784493694487,3.19
"5","Albania","2004",4.0517849478033,7.5,4.31988523813603,7.60290046220476,2.39607543608138,3.07
"6","Albania","2005",4.06388535473739,7.6,4.36054760299676,7.60339933974067,2.38508631450579,2.99
....
"1388","Yemen","2010",3.74714836223791,7.8,3.7716108517114,7.60589000105312,2.47232786758114,1.08
"1389","Yemen","2011",3.82209829790016,7.92,3.81793208202855,7.60638738977265,2.54944517092557,1.07
"1390","Yemen","2012",3.85651029549789,7.7,3.83449380291891,7.60688453121963,2.67138621673062,1.07
"1391","Yemen","2013",3.87535902105655,8.2,3.88424062441569,7.60738142563979,2.57184857992181,1.13
"1393","Zimbabwe","2012",3.81330703248899,8,3.83773040084629,7.60688453121963,3.17971910966701,2.72
"1394","Zimbabwe","2013",3.87743156065853,7.9,3.85248529271195,7.60738142563979,3.12588295801904,2.83
I am sorry if I am maybe not able to see some crucial mistakes but I'm struggeling with this problem for a few days now. GMM is a whole new territory for me and I would really appreciate some help :)
Greetings from a frustrated student