0

I am trying to fit a GLM poisson model using a stepwise procedure (I am fully aware of the pitfalls of stepwise procedures - but at work my supervisor insisted on this approach). After fitting about 19 models I got the error message "Unable to allocate 174. MiB for an array with shape (474609, 48) and data type float64". Already using 64-bit python. My training dataset has about 470K observations with 10 explanatory variables.

Is there a way to clear the memory after fitting a model? Code below:

def forward_selected(data, response):
    remaining = [var1,var2,var3,var4,var5,var6,var7,var8,var9]
    selected = []
    while len(remaining) > 2:
        scores_with_candidates = []
        for candidate in remaining:
            formula = "{} ~ {} ".format(response,
                                           '+  '.join(selected + [candidate]))
            formula = add_treatment_to_formula(formula, data, 'Exposure')
            score = smf.glm(formula=formula, data=data, 
                            family=sm.families.Poisson(), 
                            offset=np.log(data['Exposure'])).fit().aic
            current_score = score
            scores_with_candidates.append((score,candidate))
            print('Fitting candidate' + str(candidate) + ' with AIC score: ' + str(score))
        scores_with_candidates.sort(key=lambda x: x[0], reverse=True)
        best_new_score, best_candidate = scores_with_candidates.pop()        
        remaining.remove(best_candidate)
        selected.append(best_candidate)        
        print('Best new candidate: ' + str(best_candidate) + ' with AIC score ' + str(best_new_score))
    print(formula)
    model = smf.glm(formula=formula, data=data, family=sm.families.Poisson(), offset=np.log(data['Exposure'])).fit()
    return model.params    

lostwanderer
  • 143
  • 1
  • 1
  • 7
  • I don't see an obvious problem. You could try calling garbage collection in the loop, in case memory does not get freed fast enough. – Josef Dec 22 '21 at 15:46
  • I agree with @Josef : I think the problem is in the rest of the code -- you should be able to allocate 174MB contiguous memory, unless your allocation pressure is very high. Try profiling the memory usage of your code. – Morten Jensen Dec 22 '21 at 16:04
  • Thanks - as I am still pretty much a Python noob, can you show me how to call garbage collection and profiling the memory usage? Many thanks again. – lostwanderer Dec 23 '21 at 17:06

0 Answers0