I am trying to fit a GLM poisson model using a stepwise procedure (I am fully aware of the pitfalls of stepwise procedures - but at work my supervisor insisted on this approach). After fitting about 19 models I got the error message "Unable to allocate 174. MiB for an array with shape (474609, 48) and data type float64". Already using 64-bit python. My training dataset has about 470K observations with 10 explanatory variables.
Is there a way to clear the memory after fitting a model? Code below:
def forward_selected(data, response):
remaining = [var1,var2,var3,var4,var5,var6,var7,var8,var9]
selected = []
while len(remaining) > 2:
scores_with_candidates = []
for candidate in remaining:
formula = "{} ~ {} ".format(response,
'+ '.join(selected + [candidate]))
formula = add_treatment_to_formula(formula, data, 'Exposure')
score = smf.glm(formula=formula, data=data,
family=sm.families.Poisson(),
offset=np.log(data['Exposure'])).fit().aic
current_score = score
scores_with_candidates.append((score,candidate))
print('Fitting candidate' + str(candidate) + ' with AIC score: ' + str(score))
scores_with_candidates.sort(key=lambda x: x[0], reverse=True)
best_new_score, best_candidate = scores_with_candidates.pop()
remaining.remove(best_candidate)
selected.append(best_candidate)
print('Best new candidate: ' + str(best_candidate) + ' with AIC score ' + str(best_new_score))
print(formula)
model = smf.glm(formula=formula, data=data, family=sm.families.Poisson(), offset=np.log(data['Exposure'])).fit()
return model.params