0

I have a case in causal mediation analysis and I wanna estimate treatment effect using gmm (Generalized Method of Moments). So, I referenced this code. https://github.com/josef-pkt/misc/blob/master/notebooks/ex_gmm_gamma.ipynb and this question Issue with using statsmodels.sandbox.regression.gmm.GMM

and following is my code.

from statsmodels.sandbox.regression.gmm import GMM

class GMMAB(GMM):

    def __init__(self, *args, **kwds):
        # set appropriate counts for moment conditions and parameters
        kwds.setdefault('k_moms', 6)
        kwds.setdefault('k_params', 6)
        super(GMMAB, self).__init__(*args, **kwds)


    def momcond(self, params):
        c = params
        y,m = self.endog.T #[y,m]
        x = self.exog.squeeze() # x
        #inst = self.instrument   
        
        g1 = m - c[1] - c[0]*x
        g2 = x*(m - c[1] - c[0]*x)
        g3 = y - c[2] - c[3]*x - c[4]*m- c[5]*m*x
        g4 = x*(y - c[2] - c[3]*x - c[4]*m- c[5]*m*x)
        g5 = m*(y - c[2] - c[3]*x - c[4]*m- c[5]*m*x)
        g6 = m*x*(y - c[2] - c[3]*x - c[4]*m- c[5]*m*x)
        g = np.column_stack((g1, g2, g3, g4, g5, g6))
        return g

beta0 = np.array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
dta = pd.read_csv('mediation_data.csv')
y = np.array(dta.y) # y,m,x shape: [100000, 1] 
m = np.array(dta.m)
s = np.array(dta[['y','m']])
x = np.array(dta.x)
model = GMMAB(endog = s, exog = x, instrument = x, k_moms=6, k_params=6)

beta0 = np.array([0.1,0.1,0.1,0.1,0.1,0.1])
model.fit(beta0, maxiter=2, weights_method='hac', optim_method='nm')

I try to run this code in a jupyter notebook(no exception) and the notebook crashes. I have >100,000 observations, , with y_i, m_i, x_i for each unit i. So the y, m, x are all at least 100,000-1 dimensional arrays.

I do not know whether I implement GMM method in a wrong way, or I have run out of memory of the notebook(memory>3G).

Could you please give me some suggestions.

Eugene
  • 1
  • 1
  • You mean > 100,000 observations? Only 6 parameters. Are y, m and x 1-dimensional? – Josef Aug 26 '21 at 13:46
  • What does "crashes" mean? Do you get an exception? What's the traceback. – Josef Aug 26 '21 at 13:47
  • I have >100,000 observations, with y_i, m_i, x_i for each unit i. I make 6 moment conditions with 6 parameters. I try to run this code in a jupyter notebook. The notebook crashes and I do not know whether I implement GMM method in a wrong way, or I have run out of memory of the notebook(memory>3G). – Eugene Aug 29 '21 at 03:32
  • y, m and x are 1000000-dimensional and there is no exception. The notebook just crashes. When I load y, m and x in smaller size (like 20-dimensional), it works. – Eugene Aug 29 '21 at 04:02
  • what is `model.momcond(beta0).shape`? – Josef Aug 29 '21 at 19:06
  • I just ran the example with random data. I don't have any problem running this with 100,000 observations in a notebook. One problem in fit is that "hac" needs a maxlag argument, `fit(...., wargs=dict(maxlag=2))` – Josef Aug 29 '21 at 19:08
  • Even with nobs=500,000, memory consumption is not large. I use `dta = pd.DataFrame(np.random.randn(nobs, 3), columns=["y", "m", "x"])` as random data. – Josef Aug 29 '21 at 19:15

0 Answers0